1. 28 10月, 2011 6 次提交
    • A
      vfs: add generic_file_llseek_size · 5760495a
      Andi Kleen 提交于
      Add a generic_file_llseek variant to the VFS that allows passing in
      the maximum file size of the file system, instead of always
      using maxbytes from the superblock.
      
      This can be used to eliminate some cut'n'paste seek code in ext4.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      5760495a
    • A
      vfs: do (nearly) lockless generic_file_llseek · ef3d0fd2
      Andi Kleen 提交于
      The i_mutex lock use of generic _file_llseek hurts.  Independent processes
      accessing the same file synchronize over a single lock, even though
      they have no need for synchronization at all.
      
      Under high utilization this can cause llseek to scale very poorly on larger
      systems.
      
      This patch does some rethinking of the llseek locking model:
      
      First the 64bit f_pos is not necessarily atomic without locks
      on 32bit systems. This can already cause races with read() today.
      This was discussed on linux-kernel in the past and deemed acceptable.
      The patch does not change that.
      
      Let's look at the different seek variants:
      
      SEEK_SET: Doesn't really need any locking.
      If there's a race one writer wins, the other loses.
      
      For 32bit the non atomic update races against read()
      stay the same. Without a lock they can also happen
      against write() now.  The read() race was deemed
      acceptable in past discussions, and I think if it's
      ok for read it's ok for write too.
      
      => Don't need a lock.
      
      SEEK_END: This behaves like SEEK_SET plus it reads
      the maximum size too. Reading the maximum size would have the
      32bit atomic problem. But luckily we already have a way to read
      the maximum size without locking (i_size_read), so we
      can just use that instead.
      
      Without i_mutex there is no synchronization with write() anymore,
      however since the write() update is atomic on 64bit it just behaves
      like another racy SEEK_SET.  On non atomic 32bit it's the same
      as SEEK_SET.
      
      => Don't need a lock, but need to use i_size_read()
      
      SEEK_CUR: This has a read-modify-write race window
      on the same file. One could argue that any application
      doing unsynchronized seeks on the same file is already broken.
      But for the sake of not adding a regression here I'm
      using the file->f_lock to synchronize this. Using this
      lock is much better than the inode mutex because it doesn't
      synchronize between processes.
      
      => So still need a lock, but can use a f_lock.
      
      This patch implements this new scheme in generic_file_llseek.
      I dropped generic_file_llseek_unlocked and changed all callers.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ef3d0fd2
    • A
      vfs: add hex format for MAY_* flag values · 8522ca58
      Aneesh Kumar K.V 提交于
      We are going to add more flags and having them in hex format
      make it simpler
      Acked-by: NJ. Bruce Fields <bfields@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8522ca58
    • S
      Fix build break when freezer not configured · e0c8ea1a
      Steve French 提交于
      fs/cifs/transport.c: In function 'wait_for_response':
      fs/cifs/transport.c:328: error: implicit declaration of function 'wait_event_freezekillable'
      
      Caused by commit f06ac72e ("cifs, freezer: add
      wait_event_freezekillable and have cifs use it").  In this config,
      CONFIG_FREEZER is not set.
      Reviewed-by: NShirish Pargaonkar <shirishp@us.ibm.com>
      CC: Jeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      e0c8ea1a
    • D
      Revert "drm/ttm: add a way to bo_wait for either the last read or last write" · 1717c0e2
      Dave Airlie 提交于
      This reverts commit dfadbbdb.
      
      Further upstream discussion between Marek and Thomas decided this wasn't
      fully baked and needed further work, so revert it before it hits mainline.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      1717c0e2
    • D
      Revert "drm/radeon/kms: add a new gem_wait ioctl with read/write flags" · 83f30d0e
      Dave Airlie 提交于
      This reverts commit d3ed7402.
      
      Further upstream discussion between Thomas and Marek decided this needed
      more work and driver specifics. So revert before it goes upstream.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      83f30d0e
  2. 27 10月, 2011 22 次提交
  3. 26 10月, 2011 1 次提交
  4. 25 10月, 2011 9 次提交
    • B
      ore: RAID5 Write · 769ba8d9
      Boaz Harrosh 提交于
      This is finally the RAID5 Write support.
      
      The bigger part of this patch is not the XOR engine itself, But the
      read4write logic, which is a complete mini prepare_for_striping
      reading engine that can read scattered pages of a stripe into cache
      so it can be used for XOR calculation. That is, if the write was not
      stripe aligned.
      
      The main algorithm behind the XOR engine is the 2 dimensional array:
      	struct __stripe_pages_2d.
      A drawing might save 1000 words
      ---
      
      __stripe_pages_2d
             |
       n = pages_in_stripe_unit;
       w = group_width - parity;
             |                            pages array presented to the XOR lib
             |                                                |
             V                                                |
       __1_page_stripe[0].pages --> [c0][c1]..[cw][c_par] <---|
             |                                                |
       __1_page_stripe[1].pages --> [c0][c1]..[cw][c_par] <---
             |
      ...    |                         ...
             |
       __1_page_stripe[n].pages --> [c0][c1]..[cw][c_par]
                                     ^
                                     |
                 data added columns first then row
      
      ---
      The pages are put on this array columns first. .i.e:
      	p0-of-c0, p1-of-c0, ... pn-of-c0, p0-of-c1, ...
      So we are doing a corner turn of the pages.
      
      Note that pages will zigzag down and left. but are put sequentially
      in growing order. So when the time comes to XOR the stripe, only the
      beginning and end of the array need be checked. We scan the array
      and any NULL spot will be field by pages-to-be-read.
      
      The FS that wants to support RAID5 needs to supply an
      operations-vector that searches a given page in cache, and specifies
      if the page is uptodate or need reading. All these pages to be read
      are put on a slave ore_io_state and synchronously read. All the pages
      of a stripe are read in one IO, using the scatter gather mechanism.
      
      In write we constrain our IO to only be incomplete on a single
      stripe. Meaning either the complete IO is within a single stripe so
      we might have pages to read from both beginning  or end of the
      strip. Or we have some reading to do at beginning but end at strip
      boundary. The left over pages are pushed to the next IO by the API
      already established by previous work, where an IO offset/length
      combination presented to the ORE might get the length truncated and
      the user must re-submit the leftover pages. (Both exofs and NFS
      support this)
      
      But any ORE user should make it's best effort to align it's IO
      before hand and avoid complications. A cached ore_layout->stripe_size
      member can be used for that calculation. (NOTE: that ORE demands
      that stripe_size may not be bigger then 32bit)
      
      What else? Well read it and tell me.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      769ba8d9
    • B
      ore: RAID5 read · a1fec1db
      Boaz Harrosh 提交于
      This patch introduces the first stage of RAID5 support
      mainly the skip-over-raid-units when reading. For
      writes it inserts BLANK units, into where XOR blocks
      should be calculated and written to.
      
      It introduces the new "general raid maths", and the main
      additional parameters and components needed for raid5.
      
      Since at this stage it could corrupt future version that
      actually do support raid5. The enablement of raid5
      mounting and setting of parity-count > 0 is disabled. So
      the raid5 code will never be used. Mounting of raid5 is
      only enabled later once the basic XOR write is also in.
      But if the patch "enable RAID5" is applied this code has
      been tested to be able to properly read raid5 volumes
      and is according to standard.
      
      Also it has been tested that the new maths still properly
      supports RAID0 and grouping code just as before.
      (BTW: I have found more bugs in the pnfs-obj RAID math
       fixed here)
      
      The ore.c file is getting too big, so new ore_raid.[hc]
      files are added that will include the special raid stuff
      that are not used in striping and mirrors. In future write
      support these will get bigger.
      When adding the ore_raid.c to Kbuild file I was forced to
      rename ore.ko to libore.ko. Is it possible to keep source
      file, say ore.c and module file ore.ko the same even if there
      are multiple files inside ore.ko?
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      a1fec1db
    • B
      ore: Make ore_calc_stripe_info EXPORT_SYMBOL · 611d7a5d
      Boaz Harrosh 提交于
      ore_calc_stripe_info is needed by exofs::export.c
      for the layout calculations. Make it exportable
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      611d7a5d
    • F
      TCP: remove TCP_DEBUG · 78d81d15
      Flavio Leitner 提交于
      It was enabled by default and the messages guarded
      by the define are useful.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78d81d15
    • K
      m68k: Finally remove leftover markers sections · bc74ee97
      Kirill Tkhai 提交于
      Markers have removed already twice:
      
      1: fc537766
      2: eb878b3b
      
      But a little bit is still here.
      Signed-off-by: NTkhai Kirill <tkhai@yandex.ru>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      bc74ee97
    • D
      hwmon: Add driver for EXYNOS4 TMU · 9d97e5c8
      Donggeun Kim 提交于
      This patch allows to read temperature
      from TMU(Thermal Management Unit) of SAMSUNG EXYNOS4 series of SoC.
      Signed-off-by: NDonggeun Kim <dg77.kim@samsung.com>
      Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>
      9d97e5c8
    • A
      net/9p: Convert net/9p protocol dumps to tracepoints · 348b5901
      Aneesh Kumar K.V 提交于
      This helps in more control over debugging.
      root@qemu-img-64:~# ls /pass/123
      ls: cannot access /pass/123: No such file or directory
      root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace
      # tracer: nop
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
                    ls-1536  [001]    70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1)
      000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01
      010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00
      
                    ls-1536  [001]    70.928587: <stack trace>
       => trace_9p_protocol_dump
       => p9pdu_finalize
       => p9_client_rpc
       => p9_client_walk
       => v9fs_vfs_lookup
       => d_alloc_and_lookup
       => walk_component
       => path_lookupat
                    ls-1536  [000]    70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1)
      000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00
      010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00
      
                    ls-1536  [000]    70.929697: <stack trace>
       => trace_9p_protocol_dump
       => p9_client_rpc
       => p9_client_walk
       => v9fs_vfs_lookup
       => d_alloc_and_lookup
       => walk_component
       => path_lookupat
       => do_path_lookup
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      348b5901
    • D
      fs/9p: change an int to unsigned int · ef6b0807
      Dan Carpenter 提交于
      Without this msize=4294967295 will result in a crash
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      ef6b0807
    • A
      fs/9p: Update zero-copy implementation in 9p · abfa034e
      Aneesh Kumar K.V 提交于
      * remove lot of update to different data structure
      * add a seperate callback for zero copy request.
      * above makes non zero copy code path simpler
      * remove conditionalizing TREAD/TREADDIR/TWRITE in the zero copy path
      * Fix the dotu p9_check_errors with zero copy. Add sufficient doc around
      * Add support for both in and output buffers in zero copy callback
      * pin and unpin pages in the same context
      * use helpers instead of defining page offset and rest of page ourself
      * Fix mem leak in p9_check_errors
      * Remove 'E' and 'F' in p9pdu_vwritef
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      abfa034e
  5. 24 10月, 2011 2 次提交
    • N
      dt: Add empty of_match_node() macro · 5762c205
      Nicolas Ferre 提交于
      Add an empty macro for of_match_node() that will save
      some '#ifdef CONFIG_OF' for non-dt builds.
      
      I have chosen to use a macro instead of a function to
      be able to avoid defining the first parameter.
      In fact, this "struct of_device_id *" first parameter
      is usualy not defined as well on non-dt builds.
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      5762c205
    • E
      ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT · 66b13d99
      Eric Dumazet 提交于
      There is a long standing bug in linux tcp stack, about ACK messages sent
      on behalf of TIME_WAIT sockets.
      
      In the IP header of the ACK message, we choose to reflect TOS field of
      incoming message, and this might break some setups.
      
      Example of things that were broken :
        - Routing using TOS as a selector
        - Firewalls
        - Trafic classification / shaping
      
      We now remember in timewait structure the inet tos field and use it in
      ACK generation, and route lookup.
      
      Notes :
       - We still reflect incoming TOS in RST messages.
       - We could extend MuraliRaja Muniraju patch to report TOS value in
      netlink messages for TIME_WAIT sockets.
       - A patch is needed for IPv6
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66b13d99