1. 14 2月, 2017 1 次提交
  2. 06 12月, 2016 3 次提交
  3. 30 11月, 2016 1 次提交
  4. 26 9月, 2016 1 次提交
  5. 08 6月, 2016 2 次提交
  6. 26 5月, 2016 1 次提交
  7. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  8. 20 1月, 2016 5 次提交
  9. 07 1月, 2016 1 次提交
  10. 11 10月, 2015 1 次提交
  11. 09 8月, 2015 1 次提交
    • O
      Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation · b4ee1782
      Omar Sandoval 提交于
      The current RAID 5/6 recovery code isn't quite prepared to handle
      missing devices. In particular, it expects a bio that we previously
      attempted to use in the read path, meaning that it has valid pages
      allocated. However, missing devices have a NULL blkdev, and we can't
      call bio_add_page() on a bio with a NULL blkdev. We could do manual
      manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add
      a separate path that allows us to manually add pages to the rbio.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      b4ee1782
  12. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  13. 04 3月, 2015 1 次提交
  14. 17 2月, 2015 1 次提交
  15. 22 1月, 2015 3 次提交
  16. 03 12月, 2014 7 次提交
    • M
      Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 · 4245215d
      Miao Xie 提交于
      The commit c404e0dc (Btrfs: fix use-after-free in the finishing
      procedure of the device replace) fixed a use-after-free problem
      which happened when removing the source device at the end of device
      replace, but at that time, btrfs didn't support device replace
      on raid56, so we didn't fix the problem on the raid56 profile.
      Currently, we implemented device replace for raid56, so we need
      kick that problem out before we enable that function for raid56.
      
      The fix method is very simple, we just increase the bio per-cpu
      counter before we submit a raid56 io, and decrease the counter
      when the raid56 io ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      4245215d
    • M
      Btrfs, replace: write raid56 parity into the replace target device · 76035976
      Miao Xie 提交于
      This function reused the code of parity scrub, and we just write
      the right parity or corrected parity into the target device before
      the parity scrub end.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      76035976
    • M
      Btrfs, replace: write dirty pages into the replace target device · 2c8cdd6e
      Miao Xie 提交于
      The implementation is simple:
      - In order to avoid changing the code logic of btrfs_map_bio and
        RAID56, we add the stripes of the replace target devices at the
        end of the stripe array in btrfs bio, and we sort those target
        device stripes in the array. And we keep the number of the target
        device stripes in the btrfs bio.
      - Except write operation on RAID56, all the other operation don't
        take the target device stripes into account.
      - When we do write operation, we read the data from the common devices
        and calculate the parity. Then write the dirty data and new parity
        out, at this time, we will find the relative replace target stripes
        and wirte the relative data into it.
      
      Note: The function that copying old data on the source device to
      the target device was implemented in the past, it is similar to
      the other RAID type.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2c8cdd6e
    • M
      Btrfs, raid56: support parity scrub on raid56 · 5a6ac9ea
      Miao Xie 提交于
      The implementation is:
      - Read and check all the data with checksum in the same stripe.
        All the data which has checksum is COW data, and we are sure
        that it is not changed though we don't lock the stripe. because
        the space of that data just can be reclaimed after the current
        transction is committed, and then the fs can use it to store the
        other data, but when doing scrub, we hold the current transaction,
        that is that data can not be recovered, it is safe that read and check
        it out of the stripe lock.
      - Lock the stripe
      - Read out all the data without checksum and parity
        The data without checksum and the parity may be changed if we don't
        lock the stripe, so we need read it in the stripe lock context.
      - Check the parity
      - Re-calculate the new parity and write back it if the old parity
        is not right
      - Unlock the stripe
      
      If we can not read out the data or the data we read is corrupted,
      we will try to repair it. If the repair fails. we will mark the
      horizontal sub-stripe(pages on the same horizontal) as corrupted
      sub-stripe, and we will skip the parity check and repair of that
      horizontal sub-stripe.
      
      And in order to skip the horizontal sub-stripe that has no data, we
      introduce a bitmap. If there is some data on the horizontal sub-stripe,
      we will the relative bit to 1, and when we check and repair the
      parity, we will skip those horizontal sub-stripes that the relative
      bits is 0.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      5a6ac9ea
    • M
      Btrfs, raid56: use a variant to record the operation type · 1b94b556
      Miao Xie 提交于
      We will introduce new operation type later, if we still use integer
      variant as bool variant to record the operation type, we would add new
      variant and increase the size of raid bio structure. It is not good,
      by this patch, we define different number for different operation,
      and we can just use a variant to record the operation type.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      1b94b556
    • M
      Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted · af8e2d1d
      Miao Xie 提交于
      This patch implement the RAID5/6 common data repair function, the
      implementation is similar to the scrub on the other RAID such as
      RAID1, the differentia is that we don't read the data from the
      mirror, we use the data repair function of RAID5/6.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      af8e2d1d
    • M
      Btrfs, raid56: don't change bbio and raid_map · b89e1b01
      Miao Xie 提交于
      Because we will reuse bbio and raid_map during the scrub later, it is
      better that we don't change any variant of bbio and don't free it at
      the end of IO request. So we introduced similar variants into the raid
      bio, and don't access those bbio's variants any more.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      b89e1b01
  17. 18 9月, 2014 1 次提交
  18. 24 8月, 2014 1 次提交
    • L
      Btrfs: fix task hang under heavy compressed write · 9e0af237
      Liu Bo 提交于
      This has been reported and discussed for a long time, and this hang occurs in
      both 3.15 and 3.16.
      
      Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.
      
      Btrfs has a kind of work queued as an ordered way, which means that its
      ordered_func() must be processed in the way of FIFO, so it usually looks like --
      
      normal_work_helper(arg)
          work = container_of(arg, struct btrfs_work, normal_work);
      
          work->func() <---- (we name it work X)
          for ordered_work in wq->ordered_list
                  ordered_work->ordered_func()
                  ordered_work->ordered_free()
      
      The hang is a rare case, first when we find free space, we get an uncached block
      group, then we go to read its free space cache inode for free space information,
      so it will
      
      file a readahead request
          btrfs_readpages()
               for page that is not in page cache
                      __do_readpage()
                           submit_extent_page()
                                 btrfs_submit_bio_hook()
                                       btrfs_bio_wq_end_io()
                                       submit_bio()
                                       end_workqueue_bio() <--(ret by the 1st endio)
                                            queue a work(named work Y) for the 2nd
                                            also the real endio()
      
      So the hang occurs when work Y's work_struct and work X's work_struct happens
      to share the same address.
      
      A bit more explanation,
      
      A,B,C -- struct btrfs_work
      arg   -- struct work_struct
      
      kthread:
      worker_thread()
          pick up a work_struct from @worklist
          process_one_work(arg)
      	worker->current_work = arg;  <-- arg is A->normal_work
      	worker->current_func(arg)
      		normal_work_helper(arg)
      		     A = container_of(arg, struct btrfs_work, normal_work);
      
      		     A->func()
      		     A->ordered_func()
      		     A->ordered_free()  <-- A gets freed
      
      		     B->ordered_func()
      			  submit_compressed_extents()
      			      find_free_extent()
      				  load_free_space_inode()
      				      ...   <-- (the above readhead stack)
      				      end_workqueue_bio()
      					   btrfs_queue_work(work C)
      		     B->ordered_free()
      
      As if work A has a high priority in wq->ordered_list and there are more ordered
      works queued after it, such as B->ordered_func(), its memory could have been
      freed before normal_work_helper() returns, which means that kernel workqueue
      code worker_thread() still has worker->current_work pointer to be work
      A->normal_work's, ie. arg's address.
      
      Meanwhile, work C is allocated after work A is freed, work C->normal_work
      and work A->normal_work are likely to share the same address(I confirmed this
      with ftrace output, so I'm not just guessing, it's rare though).
      
      When another kthread picks up work C->normal_work to process, and finds our
      kthread is processing it(see find_worker_executing_work()), it'll think
      work C as a collision and skip then, which ends up nobody processing work C.
      
      So the situation is that our kthread is waiting forever on work C.
      
      Besides, there're other cases that can lead to deadlock, but the real problem
      is that all btrfs workqueue shares one work->func, -- normal_work_helper,
      so this makes each workqueue to have its own helper function, but only a
      wraper pf normal_work_helper.
      
      With this patch, I no long hit the above hang.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e0af237
  19. 29 6月, 2014 1 次提交
    • L
      Btrfs: fix crash when mounting raid5 btrfs with missing disks · 5588383e
      Liu Bo 提交于
      The reproducer is
      
      $ mkfs.btrfs D1 D2 D3 -mraid5
      $ mkfs.ext4 D2 && mkfs.ext4 D3
      $ mount D1 /btrfs -odegraded
      
      -------------------
      
      [   87.672992] ------------[ cut here ]------------
      [   87.673845] kernel BUG at fs/btrfs/raid56.c:1828!
      ...
      [   87.673845] RIP: 0010:[<ffffffff813efc7e>]  [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0
      ...
      [   87.673845] Call Trace:
      [   87.673845]  [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0
      [   87.673845]  [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0
      [   87.673845]  [<ffffffff81447c5b>] bio_endio+0x5b/0xa0
      [   87.673845]  [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20
      [   87.673845]  [<ffffffff81374621>] end_workqueue_fn+0x41/0x50
      [   87.673845]  [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0
      [   87.673845]  [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530
      [   87.673845]  [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530
      [   87.673845]  [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0
      [   87.673845]  [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290
      [   87.673845]  [<ffffffff810939c4>] kthread+0xe4/0x100
      [   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
      [   87.673845]  [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0
      [   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
      
      -------------------
      
      It's because that we miscalculate @rbio->bbio->error so that it doesn't
      reach maximum of tolerable errors while it should have.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Tested-by: Satoru Takeuchi<takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5588383e
  20. 11 3月, 2014 2 次提交
  21. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  22. 12 11月, 2013 1 次提交
  23. 01 9月, 2013 2 次提交