1. 10 2月, 2014 1 次提交
    • A
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro 提交于
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d311d79d
  2. 07 2月, 2014 1 次提交
    • S
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li 提交于
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
  3. 06 2月, 2014 4 次提交
  4. 05 2月, 2014 15 次提交
    • J
      wireless: sort and extend element ID list · 8c78e380
      Johannes Berg 提交于
      The element ID list is currently almost sorted by amendment
      or similar topic, but the order is difficult to maintain and
      not very transparent. Sort the list by ID instead, and add
      a lot of missing IDs.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      8c78e380
    • J
      cfg80211: regulatory introduce maximum bandwidth calculation · 97524820
      Janusz Dziedzic 提交于
      In case we will get regulatory request with rule
      where max_bandwidth_khz is set to 0 handle this
      case as a special one.
      
      If max_bandwidth_khz == 0 we should calculate maximum
      available bandwidth base on all frequency contiguous rules.
      In case we need auto calculation we just have to set:
      
      country PL: DFS-ETSI
              (2402 - 2482 @ 40), (N/A, 20)
              (5170 - 5250 @ AUTO), (N/A, 20)
              (5250 - 5330 @ AUTO), (N/A, 20), DFS
              (5490 - 5710 @ 80), (N/A, 27), DFS
      
      This mean we will calculate maximum bw for rules where
      AUTO (N/A) were set, 160MHz (5330 - 5170) in example above.
      So we will get:
              (5170 - 5250 @ 160), (N/A, 20)
              (5250 - 5330 @ 160), (N/A, 20), DFS
      
      In other case:
      country FR: DFS-ETSI
              (2402 - 2482 @ 40), (N/A, 20)
              (5170 - 5250 @ AUTO), (N/A, 20)
              (5250 - 5330 @ 80), (N/A, 20), DFS
              (5490 - 5710 @ 80), (N/A, 27), DFS
      
      We will get 80MHz (5250 - 5170):
              (5170 - 5250 @ 80), (N/A, 20)
              (5250 - 5330 @ 80), (N/A, 20), DFS
      
      Base on this calculations we will set correct channel
      bandwidth flags (eg. IEEE80211_CHAN_NO_80MHZ).
      
      We don't need any changes in CRDA or internal regulatory.
      Signed-off-by: NJanusz Dziedzic <janusz.dziedzic@tieto.com>
      [extend nl80211 description a bit, fix typo]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      97524820
    • M
      cfg80211: consider existing DFS interfaces · 9e0e2961
      Michal Kazior 提交于
      It was possible to break interface combinations in
      the following way:
      
       combo 1: iftype = AP, num_ifaces = 2, num_chans = 2,
       combo 2: iftype = AP, num_ifaces = 1, num_chans = 1, radar = HT20
      
      With the above interface combinations it was
      possible to:
      
       step 1. start AP on DFS channel by matching combo 2
       step 2. start AP on non-DFS channel by matching combo 1
      
      This was possible beacuse (step 2) did not consider
      if other interfaces require radar detection.
      
      The patch changes how cfg80211 tracks channels -
      instead of channel itself now a complete chandef
      is stored.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      9e0e2961
    • A
      cfg80211: fix channel configuration in IBSS join · fe94f3a4
      Antonio Quartulli 提交于
      When receiving an IBSS_JOINED event select the BSS object
      based on the {bssid, channel} couple rather than the bssid
      only.
      With the current approach if another cell having the same
      BSSID (but using a different channel) exists then cfg80211
      picks up the wrong BSS object.
      The result is a mismatching channel configuration between
      cfg80211 and the driver, that can lead to any sort of
      problem.
      
      The issue can be triggered by having an IBSS sitting on
      given channel and then asking the driver to create a new
      cell using the same BSSID but with a different frequency.
      By passing the channel to cfg80211_get_bss() we can solve
      this ambiguity and retrieve/create the correct BSS object.
      All the users of cfg80211_ibss_joined() have been changed
      accordingly.
      
      Moreover WARN when cfg80211_ibss_joined() gets a NULL
      channel as argument and remove a bogus call of the same
      function in ath6kl (it does not make sense to call
      cfg80211_ibss_joined() with a zero BSSID on ibss-leave).
      
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Arend van Spriel <arend@broadcom.com>
      Cc: Bing Zhao <bzhao@marvell.com>
      Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi>
      Cc: libertas-dev@lists.infradead.org
      Acked-by: NKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      [minor code cleanup in ath6kl]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      fe94f3a4
    • J
      mac80211: fix bufferable MMPDU RX handling · b4ba544c
      Johannes Berg 提交于
      Action, disassoc and deauth frames are bufferable, and as such don't
      have the PM bit in the frame control field reserved which means we
      need to react to the bit when receiving in such a frame.
      
      Fix this by introducing a new helper ieee80211_is_bufferable_mmpdu()
      and using it for the RX path that currently ignores the PM bit in
      any non-data frames for doze->wake transitions, but listens to it in
      all frames for wake->doze transitions, both of which are wrong.
      
      Also use the new helper in the TX path to clean up the code.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b4ba544c
    • J
      nl80211: fix scheduled scan RSSI matchset attribute confusion · ea73cbce
      Johannes Berg 提交于
      The scheduled scan matchsets were intended to be a list of filters,
      with the found BSS having to pass at least one of them to be passed
      to the host. When the RSSI attribute was added, however, this was
      broken and currently wpa_supplicant adds that attribute in its own
      matchset; however, it doesn't intend that to mean that anything
      that passes the RSSI filter should be passed to the host, instead
      it wants it to mean that everything needs to also have higher RSSI.
      
      This is semantically problematic because we have a list of filters
      like [ SSID1, SSID2, SSID3, RSSI ] with no real indication which
      one should be OR'ed and which one AND'ed.
      
      To fix this, move the RSSI filter attribute into each matchset. As
      we need to stay backward compatible, treat a matchset with only the
      RSSI attribute as a "default RSSI filter" for all other matchsets,
      but only if there are other matchsets (an RSSI-only matchset by
      itself is still desirable.)
      
      To make driver implementation easier, keep a global min_rssi_thold
      for the entire request as well. The only affected driver is ath6kl.
      
      I found this when I looked into the code after Raja Mani submitted
      a patch fixing the n_match_sets calculation to disregard the RSSI,
      but that patch didn't address the semantic issue.
      Reported-by: NRaja Mani <rmani@qti.qualcomm.com>
      Acked-by: NLuciano Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      ea73cbce
    • J
      mac80211: add length check in ieee80211_is_robust_mgmt_frame() · d8ca16db
      Johannes Berg 提交于
      A few places weren't checking that the frame passed to the
      function actually has enough data even though the function
      clearly documents it must have a payload byte. Make this
      safer by changing the function to take an skb and checking
      the length inside. The old version is preserved for now as
      the rtl* drivers use it and don't have a correct skb.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      d8ca16db
    • J
      mac80211: remove module handling from rate control ops · cc01f9b5
      Johannes Berg 提交于
      There's not a single rate control algorithm actually in
      a separate module where the module refcount would be
      required. Similarly, there's no specific rate control
      module.
      
      Therefore, all the module handling code in rate control
      is really just dead code, so remove it.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      cc01f9b5
    • J
      mac80211: make rate control ops const · 631ad703
      Johannes Berg 提交于
      Change the code to allow making all the rate control ops
      const, nothing ever needs to change them. Also change all
      drivers to make use of this and mark the ops const.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      631ad703
    • J
      nl80211: add Guard Interval support for set_bitrate_mask · 0b9323f6
      Janusz Dziedzic 提交于
      Allow to force SGI, LGI.
      Mainly for test purpose.
      Signed-off-by: NJanusz Dziedzic <janusz.dziedzic@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0b9323f6
    • J
      cfg80211: make connect ie param const · 4b5800fe
      Johannes Berg 提交于
      This required liberally sprinkling 'const' over brcmfmac
      and mwifiex but seems like a useful thing to do since the
      pointer can't really be written.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      4b5800fe
    • J
      cfg80211: Clean up connect params and channel fetching · 664834de
      Jouni Malinen 提交于
      Addition of the frequency hints showed up couple of places in cfg80211
      where pointers could be marked const and a shared function could be used
      to fetch a valid channel.
      Signed-off-by: NJouni Malinen <j@w1.fi>
      [fix mwifiex]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      664834de
    • J
      cfg80211: Advertise maximum associated STAs in AP mode · b43504cf
      Jouni Malinen 提交于
      This allows drivers to advertise the maximum number of associated
      stations they support in AP mode (including P2P GO). User space
      applications can use this for cleaner way of handling the limit (e.g.,
      hostapd rejecting IEEE 802.11 authentication without manual
      configuration of the limit) or to figure out what type of use cases can
      be executed with multiple devices before trying and failing.
      Signed-off-by: NJouni Malinen <j@w1.fi>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b43504cf
    • J
      cfg80211: Allow BSS hint to be provided for connect · 1df4a510
      Jouni Malinen 提交于
      This clarifies the expected driver behavior on the older
      NL80211_ATTR_MAC and NL80211_ATTR_WIPHY_FREQ attributes and adds a new
      set of similar attributes with _HINT postfix to enable use of a
      recommendation of the initial BSS to choose. This can be helpful for
      some drivers that can avoid an additional full scan on connection
      request if the information is provided to them (user space tools like
      wpa_supplicant already has that information available based on earlier
      scans).
      
      In addition, this can be used to get more expected behavior for cases
      where a specific BSS should be picked first based on operations like
      Interworking network selection or WPS. These cases were already easily
      addressed with drivers that leave BSS selection to user space, but there
      was no convenient way to do this with drivers that take care of BSS
      selection internally without using the NL80211_ATTR_MAC which is not
      really desired since it is needed for other purposes to force the
      association to remain with the same BSS.
      Signed-off-by: NJouni Malinen <j@w1.fi>
      [add const, fix policy]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      1df4a510
    • L
      mac80211: only set CSA beacon when at least one beacon must be transmitted · 66e01cf9
      Luciano Coelho 提交于
      A beacon should never have a Channel Switch Announcement information
      element with a count of 0, because a count of 1 means switch just
      before the next beacon.  So, if a count of 0 was valid in a beacon, it
      would have been transmitted in the next channel already, which is
      useless.  A CSA count equal to zero is only meaningful in action
      frames or probe_responses.
      
      Fix the ieee80211_csa_is_complete() and ieee80211_update_csa()
      functions accordingly.
      
      With a CSA count of 0, we won't transmit any CSA beacons, because the
      switch will happen before the next TBTT.  To avoid extra work and
      potential confusion in the drivers, complete the CSA immediately,
      instead of waiting for the driver to call ieee80211_csa_finish().
      
      To keep things simpler, we also switch immediately when the CSA count
      is 1, while in theory we should delay the switch until just before the
      next TBTT.
      
      Additionally, move the ieee80211_csa_finish() function to cfg.c,
      where it makes more sense.
      Tested-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Acked-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      66e01cf9
  5. 03 2月, 2014 1 次提交
  6. 02 2月, 2014 1 次提交
  7. 01 2月, 2014 1 次提交
  8. 31 1月, 2014 6 次提交
    • Z
      xen/grant-table: Avoid m2p_override during mapping · 08ece5bb
      Zoltan Kiss 提交于
      The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
      for blkback and future netback patches it just cause a lock contention, as
      those pages never go to userspace. Therefore this series does the following:
      - the original functions were renamed to __gnttab_[un]map_refs, with a new
        parameter m2p_override
      - based on m2p_override either they follow the original behaviour, or just set
        the private flag and call set_phys_to_machine
      - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
        m2p_override false
      - a new function gnttab_[un]map_refs_userspace provides the old behaviour
      
      It also removes a stray space from page.h and change ret to 0 if
      XENFEAT_auto_translated_physmap, as that is the only possible return value
      there.
      
      v2:
      - move the storing of the old mfn in page->index to gnttab_map_refs
      - move the function header update to a separate patch
      
      v3:
      - a new approach to retain old behaviour where it needed
      - squash the patches into one
      
      v4:
      - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
      - clear page->private before doing anything with the page, so m2p_find_override
        won't race with this
      
      v5:
      - change return value handling in __gnttab_[un]map_refs
      - remove a stray space in page.h
      - add detail why ret = 0 now at some places
      
      v6:
      - don't pass pfn to m2p* functions, just get it locally
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      08ece5bb
    • D
      mm: sl[uo]b: fix misleading comments · 433a91ff
      Dave Hansen 提交于
      On x86, SLUB creates and handles <=8192-byte allocations internally.
      It passes larger ones up to the allocator.  Saying "up to order 2" is,
      at best, ambiguous.  Is that order-1?  Or (order-2 bytes)?  Make
      it more clear.
      
      SLOB commits a similar sin.  It *handles* page-size requests, but the
      comment says that it passes up "all page size and larger requests".
      
      SLOB also swaps around the order of the very-similarly-named
      KMALLOC_SHIFT_HIGH and KMALLOC_SHIFT_MAX #defines.  Make it
      consistent with the order of the other two allocators.
      
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      433a91ff
    • M
      zsmalloc: add copyright · 31fc00bb
      Minchan Kim 提交于
      Add my copyright to the zsmalloc source code which I maintain.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31fc00bb
    • M
      zsmalloc: move it under mm · bcf1647d
      Minchan Kim 提交于
      This patch moves zsmalloc under mm directory.
      
      Before that, description will explain why we have needed custom
      allocator.
      
      Zsmalloc is a new slab-based memory allocator for storing compressed
      pages.  It is designed for low fragmentation and high allocation success
      rate on large object, but <= PAGE_SIZE allocations.
      
      zsmalloc differs from the kernel slab allocator in two primary ways to
      achieve these design goals.
      
      zsmalloc never requires high order page allocations to back slabs, or
      "size classes" in zsmalloc terms.  Instead it allows multiple
      single-order pages to be stitched together into a "zspage" which backs
      the slab.  This allows for higher allocation success rate under memory
      pressure.
      
      Also, zsmalloc allows objects to span page boundaries within the zspage.
      This allows for lower fragmentation than could be had with the kernel
      slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
      kernel slab allocator, if a page compresses to 60% of it original size,
      the memory savings gained through compression is lost in fragmentation
      because another object of the same size can't be stored in the leftover
      space.
      
      This ability to span pages results in zsmalloc allocations not being
      directly addressable by the user.  The user is given an
      non-dereferencable handle in response to an allocation request.  That
      handle must be mapped, using zs_map_object(), which returns a pointer to
      the mapped region that can be used.  The mapping is necessary since the
      object data may reside in two different noncontigious pages.
      
      The zsmalloc fulfills the allocation needs for zram perfectly
      
      [sjenning@linux.vnet.ibm.com: borrow Seth's quote]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf1647d
    • C
      kernel: use lockless list for smp_call_function_single · 6897fc22
      Christoph Hellwig 提交于
      Make smp_call_function_single and friends more efficient by using a
      lockless list.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6897fc22
    • Y
      memblock, bootmem: restore goal for alloc_low · 07bacb38
      Yinghai Lu 提交于
      Now we have memblock_virt_alloc_low to replace original bootmem api in
      swiotlb.
      
      But we should not use BOOTMEM_LOW_LIMIT for arch that does not support
      CONFIG_NOBOOTMEM, as old api take 0.
      
      | #define alloc_bootmem_low(x) \
      |        __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
      |#define alloc_bootmem_low_pages_nopanic(x) \
      |        __alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
      
      and we have
       #define BOOTMEM_LOW_LIMIT __pa(MAX_DMA_ADDRESS)
      for CONFIG_NOBOOTMEM.
      
      Restore goal to 0 to fix ia64 crash, that Tony found.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reported-by: NTony Luck <tony.luck@gmail.com>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      07bacb38
  9. 30 1月, 2014 6 次提交
  10. 29 1月, 2014 4 次提交
    • J
      fsnotify: Do not return merged event from fsnotify_add_notify_event() · 83c0e1b4
      Jan Kara 提交于
      The event returned from fsnotify_add_notify_event() cannot ever be used
      safely as the event may be freed by the time the function returns (after
      dropping notification_mutex). So change the prototype to just return
      whether the event was added or merged into some existing event.
      Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
      Reported-and-tested-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      83c0e1b4
    • F
      Btrfs: add support for inode properties · 63541927
      Filipe David Borba Manana 提交于
      This change adds infrastructure to allow for generic properties for
      inodes. Properties are name/value pairs that can be associated with
      inodes for different purposes. They are stored as xattrs with the
      prefix "btrfs."
      
      Properties can be inherited - this means when a directory inode has
      inheritable properties set, these are added to new inodes created
      under that directory. Further, subvolumes can also have properties
      associated with them, and they can be inherited from their parent
      subvolume. Naturally, directory properties have priority over subvolume
      properties (in practice a subvolume property is just a regular
      property associated with the root inode, objectid 256, of the
      subvolume's fs tree).
      
      This change also adds one specific property implementation, named
      "compression", whose values can be "lzo" or "zlib" and it's an
      inheritable property.
      
      The corresponding changes to btrfs-progs were also implemented.
      A patch with xfstests for this feature will follow once there's
      agreement on this change/feature.
      
      Further, the script at the bottom of this commit message was used to
      do some benchmarks to measure any performance penalties of this feature.
      
      Basically the tests correspond to:
      
      Test 1 - create a filesystem and mount it with compress-force=lzo,
      then sequentially create N files of 64Kb each, measure how long it took
      to create the files, unmount the filesystem, mount the filesystem and
      perform an 'ls -lha' against the test directory holding the N files, and
      report the time the command took.
      
      Test 2 - create a filesystem and don't use any compression option when
      mounting it - instead set the compression property of the subvolume's
      root to 'lzo'. Then create N files of 64Kb, and report the time it took.
      The unmount the filesystem, mount it again and perform an 'ls -lha' like
      in the former test. This means every single file ends up with a property
      (xattr) associated to it.
      
      Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
      compression property, have no real effect other than adding more work
      when inheriting properties and taking more btree leaf space.
      
      Test 4 - same as test 3 but with 10 properties per file.
      
      Results (in seconds, and averages of 5 runs each), for different N
      numbers of files follow.
      
      * Without properties (test 1)
      
                          file creation time        ls -lha time
      10 000 files              3.49                   0.76
      100 000 files            47.19                   8.37
      1 000 000 files         518.51                 107.06
      
      * With 1 property (compression property set to lzo - test 2)
      
                          file creation time        ls -lha time
      10 000 files              3.63                    0.93
      100 000 files            48.56                    9.74
      1 000 000 files         537.72                  125.11
      
      * With 4 properties (test 3)
      
                          file creation time        ls -lha time
      10 000 files              3.94                    1.20
      100 000 files            52.14                   11.48
      1 000 000 files         572.70                  142.13
      
      * With 10 properties (test 4)
      
                          file creation time        ls -lha time
      10 000 files              4.61                    1.35
      100 000 files            58.86                   13.83
      1 000 000 files         656.01                  177.61
      
      The increased latencies with properties are essencialy because of:
      
      *) When creating an inode, we now synchronously write 1 more item
         (an xattr item) for each property inherited from the parent dir
         (or subvolume). This could be done in an asynchronous way such
         as we do for dir intex items (delayed-inode.c), which could help
         reduce the file creation latency;
      
      *) With properties, we now have larger fs trees. For this particular
         test each xattr item uses 75 bytes of leaf space in the fs tree.
         This could be less by using a new item for xattr items, instead of
         the current btrfs_dir_item, since we could cut the 'location' and
         'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
         total of 26 bytes per xattr item) from the btrfs_dir_item type.
      
      Also tried batching the xattr insertions (ignoring proper hash
      collision handling, since it didn't exist) when creating files that
      inherit properties from their parent inode/subvolume, but the end
      results were (surprisingly) essentially the same.
      
      Test script:
      
      $ cat test.pl
        #!/usr/bin/perl -w
      
        use strict;
        use Time::HiRes qw(time);
        use constant NUM_FILES => 10_000;
        use constant FILE_SIZES => (64 * 1024);
        use constant DEV => '/dev/sdb4';
        use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
        use constant TEST_DIR => (MNT_POINT . '/testdir');
      
        system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";
      
        # following line for testing without properties
        #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        # following 2 lines for testing with properties
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
        system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";
      
        system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
        my ($t1, $t2);
      
        $t1 = time();
        for (my $i = 1; $i <= NUM_FILES; $i++) {
            my $p = TEST_DIR . '/file_' . $i;
            open(my $f, '>', $p) or die "Error opening file!";
            $f->autoflush(1);
            for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
                print $f ('A' x 4096) or die "Error writing to file!";
            }
            close($f);
        }
        $t2 = time();
        print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        $t1 = time();
        system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
        $t2 = time();
        print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      63541927
    • J
      rwsem: add rwsem_is_contended · 4a444b1f
      Josef Bacik 提交于
      Btrfs needs a simple way to know if it needs to let go of it's read lock on a
      rwsem.  Introduce rwsem_is_contended to check to see if there are any waiters on
      this rwsem currently.  This is just a hueristic, it is meant to be light and not
      100% accurate and called by somebody already holding on to the rwsem in either
      read or write.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      4a444b1f
    • L
      Btrfs/tracepoint: update new flags for ordered extent TP · 792ddef0
      Liu Bo 提交于
      Flag BTRFS_ORDERED_TRUNCATED is a new one, update the tracepoint to
      support it.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      792ddef0