1. 13 2月, 2007 19 次提交
    • M
      [PATCH] eCryptfs: Public key transport mechanism · 88b4a07e
      Michael Halcrow 提交于
      This is the transport code for public key functionality in eCryptfs.  It
      manages encryption/decryption request queues with a transport mechanism.
      Currently, netlink is the only implemented transport.
      
      Each inode has a unique File Encryption Key (FEK).  Under passphrase, a File
      Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase
      combo on mount.  This FEKEK encrypts each FEK and writes it into the header of
      each file using the packet format specified in RFC 2440.  This is all
      symmetric key encryption, so it can all be done via the kernel crypto API.
      
      These new patches introduce public key encryption of the FEK.  There is no
      asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes
      the FEK encryption and decryption out to a userspace daemon.  After
      considering our requirements and determining the complexity of using various
      transport mechanisms, we settled on netlink for this communication.
      
      eCryptfs stores authentication tokens into the kernel keyring.  These tokens
      correlate with individual keys.  For passphrase mode of operation, the
      authentication token contains the symmetric FEKEK.  For public key, the
      authentication token contains a PKI type and an opaque data blob managed by
      individual PKI modules in userspace.
      
      Each user who opens a file under an eCryptfs partition mounted in public key
      mode must be running a daemon.  That daemon has the user's credentials and has
      access to all of the keys to which the user should have access.  The daemon,
      when started, initializes the pluggable PKI modules available on the system
      and registers itself with the eCryptfs kernel module.  Userspace utilities
      register public key authentication tokens into the user session keyring.
      These authentication tokens correlate key signatures with PKI modules and PKI
      blobs.  The PKI blobs contain PKI-specific information necessary for the PKI
      module to carry out asymmetric key encryption and decryption.
      
      When the eCryptfs module parses the header of an existing file and finds a Tag
      1 (Public Key) packet (see RFC 2440), it reads in the public key identifier
      (signature).  The asymmetrically encrypted FEK is in the Tag 1 packet;
      eCryptfs puts together a decrypt request packet containing the signature and
      the encrypted FEK, then it passes it to the daemon registered for the
      current->euid via a netlink unicast to the PID of the daemon, which was
      registered at the time the daemon was started by the user.
      
      The daemon actually just makes calls to libecryptfs, which implements request
      packet parsing and manages PKI modules.  libecryptfs grabs the public key
      authentication token for the given signature from the user session keyring.
      This auth tok tells libecryptfs which PKI module should receive the request.
      libecryptfs then makes a decrypt() call to the PKI module, and it passes along
      the PKI block from the auth tok.  The PKI uses the blob to figure out how it
      should decrypt the data passed to it; it performs the decryption and passes
      the decrypted data back to libecryptfs.  libecryptfs then puts together a
      reply packet with the decrypted FEK and passes that back to the eCryptfs
      module.
      
      The eCryptfs module manages these request callouts to userspace code via
      message context structs.  The module maintains an array of message context
      structs and places the elements of the array on two lists: a free and an
      allocated list.  When eCryptfs wants to make a request, it moves a msg ctx
      from the free list to the allocated list, sets its state to pending, and fires
      off the message to the user's registered daemon.
      
      When eCryptfs receives a netlink message (via the callback), it correlates the
      msg ctx struct in the alloc list with the data in the message itself.  The
      msg->index contains the offset of the array of msg ctx structs.  It verifies
      that the registered daemon PID is the same as the PID of the process that sent
      the message.  It also validates a sequence number between the received packet
      and the msg ctx.  Then, it copies the contents of the message (the reply
      packet) into the msg ctx struct, sets the state in the msg ctx to done, and
      wakes up the process that was sleeping while waiting for the reply.
      
      The sleeping process was whatever was performing the sys_open().  This process
      originally called ecryptfs_send_message(); it is now in
      ecryptfs_wait_for_response().  When it wakes up and sees that the msg ctx
      state was set to done, it returns a pointer to the message contents (the reply
      packet) and returns.  If all went well, this packet contains the decrypted
      FEK, which is then copied into the crypt_stat struct, and life continues as
      normal.
      
      The case for creation of a new file is very similar, only instead of a decrypt
      request, eCryptfs sends out an encrypt request.
      
      > - We have a great clod of key mangement code in-kernel.  Why is that
      >   not suitable (or growable) for public key management?
      
      eCryptfs uses Howells' keyring to store persistent key data and PKI state
      information.  It defers public key cryptographic transformations to userspace
      code.  The userspace data manipulation request really is orthogonal to key
      management in and of itself.  What eCryptfs basically needs is a secure way to
      communicate with a particular daemon for a particular task doing a syscall,
      based on the UID.  Nothing running under another UID should be able to access
      that channel of communication.
      
      > - Is it appropriate that new infrastructure for public key
      > management be private to a particular fs?
      
      The messaging.c file contains a lot of code that, perhaps, could be extracted
      into a separate kernel service.  In essence, this would be a sort of
      request/reply mechanism that would involve a userspace daemon.  I am not aware
      of anything that does quite what eCryptfs does, so I was not aware of any
      existing tools to do just what we wanted.
      
      >   What happens if one of these daemons exits without sending a quit
      >   message?
      
      There is a stale uid<->pid association in the hash table for that user.  When
      the user registers a new daemon, eCryptfs cleans up the old association and
      generates a new one.  See ecryptfs_process_helo().
      
      > - _why_ does it use netlink?
      
      Netlink provides the transport mechanism that would minimize the complexity of
      the implementation, given that we can have multiple daemons (one per user).  I
      explored the possibility of using relayfs, but that would involve having to
      introduce control channels and a protocol for creating and tearing down
      channels for the daemons.  We do not have to worry about any of that with
      netlink.
      Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88b4a07e
    • A
      [PATCH] include/linux/nfsd/const.h: remove NFS_SUPER_MAGIC · b5d5dfbd
      Adrian Bunk 提交于
      NFS_SUPER_MAGIC is already defined in include/linux/magic.h
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5d5dfbd
    • C
      [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses · 27459f09
      Chuck Lever 提交于
      Expand the rq_addr field to allow it to contain larger addresses.
      
      Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
      everywhere the 'sockaddr_in' was referenced, we use instead an accessor
      function (svc_addr_in) which safely casts the _storage to _in.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27459f09
    • C
      [PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing · ad06e4bd
      Chuck Lever 提交于
      There are loads of places where the RPC server assumes that the rq_addr fields
      contains an IPv4 address.  Top among these are error and debugging messages
      that display the server's IP address.
      
      Let's refactor the address printing into a separate function that's smart
      enough to figure out the difference between IPv4 and IPv6 addresses.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad06e4bd
    • C
      [PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper · 482fb94e
      Chuck Lever 提交于
      Sometimes we need to create an RPC service but not register it with the local
      portmapper.  NFSv4 delegation callback, for example.
      
      Change the svc_makesock() API to allow optionally creating temporary or
      permanent sockets, optionally registering with the local portmapper, and make
      it return the ephemeral port of the new socket.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      482fb94e
    • E
      [PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task · 41487c65
      Eric W. Biederman 提交于
      There isn't any real advantage to this change except that it allows the old
      functions to be removed.  Which is easier on maintenance and puts the code in
      a more uniform style.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41487c65
    • E
      [PATCH] tty: update the tty layer to work with struct pid · ab521dc0
      Eric W. Biederman 提交于
      Of kernel subsystems that work with pids the tty layer is probably the largest
      consumer.  But it has the nice virtue that the assiation with a session only
      lasts until the session leader exits.  Which means that no reference counting
      is required.  So using struct pid winds up being a simple optimization to
      avoid hash table lookups.
      
      In the long term the use of pid_nr also ensures that when we have multiple pid
      spaces mixed everything will work correctly.
      Signed-off-by: NEric W. Biederman <eric@maxwell.lnxi.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab521dc0
    • A
      [PATCH] Minix V3 support · 939b00df
      Andries Brouwer 提交于
      This morning I needed to read a Minix V3 filesystem, but unfortunately my
      2.6.19 did not support that, and neither did the downloaded 2.6.20rc4.
      
      Fortunately, google told me that Daniel Aragones had already done the work,
      patch found at http://www.terra.es/personal2/danarag/
      
      Unfortunaly, looking at the patch was painful to my eyes, so I polished it
      a bit before applying.  The resulting kernel boots, and reads the
      filesystem it needed to read.
      Signed-off-by: NDaniel Aragones <danarag@gmail.com>
      Signed-off-by: NAndries Brouwer <aeb@cwi.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      939b00df
    • E
      [PATCH] FS: speed up rw_verify_area() · 163da958
      Eric Dumazet 提交于
      oprofile hunting showed a stall in rw_verify_area(), because of triple
      indirection and potential cache misses.
      (file->f_path.dentry->d_inode->i_flock)
      
      By moving initialization of 'struct inode' pointer before the pos/count
      sanity tests, we allow the compiler and processor to perform two loads by
      anticipation, reducing stall, without prefetch() hints.  Even x86 arch has
      enough registers to not use temporary variables and not increase text size.
      
      I validated this patch running a bench and studied oprofile changes, and
      absolute perf of the test program.
      
      Results of my epoll_pipe_bench (source available on request) on a Pentium-M
      1.6 GHz machine
      
      Before :
      # ./epoll_pipe_bench -l 30 -t 20
      Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples
      per call
      (best value out of 10 runs)
      
      After :
      # ./epoll_pipe_bench -l 30 -t 20
      Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples
      per call
      (best value out of 10 runs)
      
      oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 %
      for the rw_verify_area() function.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      163da958
    • T
      [PATCH] warning fix: unsigned->signed · 3991d3bd
      Tomasz Kvarsin 提交于
      While compiling my code with -Wconversion using gcc-trunk, I always get a
      bunch of warrning from headers, here is fix for them:
      
      __getblk is alawys called with unsigned argument,
      but it takes signed, the same story with __bread,__breadahead and so on.
      
      Signed-off-by: Tomasz Kvarsin
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3991d3bd
    • A
      [PATCH] reiserfs: Use ARRAY_SIZE macro when appropriate · 79a81aef
      Ahmed S. Darwish 提交于
      Use ARRAY_SIZE macro already defined in kernel.h
      Signed-off-by: NAhmed S. Darwish <darwish.07@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79a81aef
    • N
      [PATCH] inotify: read return val fix · f9e4acf3
      Nick Piggin 提交于
      Fix for inotify read bug (bugzilla.kernel.org #6999)
      
      Problem Description:
      When reading from an inotify device with an insufficient sized buffer, read(2)
      will return 0 with no errno set. This is because of an logically incorrect
      action from the user program thus should return an more logical value. My
      suggestion is return -EINVAL as for bind(2).
      
      This patch is based on the proposal from Ryan <wolf0403@hotmail.com>, and
      feedback from John McCutchan <john@johnmccutchan.com>.
      
      Return -EINVAL if we have not passed in enough buffer space to read a single
      inotify event, rather than 0 which indicates that there is nothing to read.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: N"John McCutchan" <john@johnmccutchan.com>
      Cc: Ryan <wolf0403@hotmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9e4acf3
    • C
      [PATCH] remove sb->s_files and file_list_lock usage in dquot.c · d003fb70
      Christoph Hellwig 提交于
      Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref.  This
      reduces list search and lock hold time aswell as getting rid of one of the
      few uses of file_list_lock which Ingo identified as a scalability problem.
      
      Previously we called dq_op->initialize for every inode handing of a
      writeable file that wasn't initialized before.  Now we're calling it for
      every inode that has a non-zero i_writecount, aka a writeable file
      descriptor refering to it.
      
      Thanks a lot to Jan Kara for running this patch through his quota test
      harness.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d003fb70
    • C
      [PATCH] move remove_dquot_ref to dqout.c · fb58b731
      Christoph Hellwig 提交于
      Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under
      #ifdef CONFIG_QUOTA.  Also clean the resulting code up a tiny little bit by
      testing sb->dq_op earlier - it's constant over a filesystems lifetime.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb58b731
    • A
      [PATCH] Fix d_path for lazy unmounts · eb3dfb0c
      Andreas Gruenbacher 提交于
      Here is a bugfix to d_path.
      
      First, when d_path() hits a lazily unmounted mount point, it tries to
      prepend the name of the lazily unmounted dentry to the path name.  It gets
      this wrong, and also overwrites the slash that separates the name from the
      following pathname component.  This is demonstrated by the attached test
      case, which prints "getcwd returned d_path-bugsubdir" with the bug.  The
      correct result would be "getcwd returned d_path-bug/subdir".
      
      It could be argued that the name of the root dentry should not be part of
      the result of d_path in the first place.  On the other hand, what the
      unconnected namespace was once reachable as may provide some useful hints
      to users, and so that seems okay.
      
      Second, it isn't always possible to tell from the __d_path result whether
      the specified root and rootmnt (i.e., the chroot) was reached: lazy
      unmounts of bind mounts will produce a path that does start with a
      non-slash so we can tell from that, but other lazy unmounts will produce a
      path that starts with a slash, just like "ordinary" paths.
      
      The attached patch cleans up __d_path() to fix the bug with overlapping
      pathname components.  It also adds a @fail_deleted argument, which allows
      to get rid of some of the mess in sys_getcwd().  Grabbing the dcache_lock
      can then also be moved into __d_path().  The patch also makes sure that
      paths will only start with a slash for paths which are connected to the
      root and rootmnt.
      
      The @fail_deleted argument could be added to d_path() as well: this would
      allow callers to recognize deleted files, without having to resort to the
      ambiguous check for the " (deleted)" string at the end of the pathnames.
      This is not currently done, but it might be worthwhile.
      Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb3dfb0c
    • R
      [PATCH] NTFS: rename incorrect check of NTFS_DEBUG with just DEBUG · 5c3bd438
      Robert P. J. Day 提交于
      Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with
      just "#ifdef DEBUG".
      Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
      Acked-by: NAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c3bd438
    • A
      [PATCH] register_chrdev_region() don't hand out the LOCAL/EXPERIMENTAL majors · 215122e1
      Andrew Morton 提交于
      As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic
      chardev major allocation can hand out majors which LANANA has defined as being
      for local/experimental use.
      
      Cc: Torben Mathiasen <device@lanana.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Tomas Klas <tomas.klas@mepatek.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      215122e1
    • D
      [PATCH] Make XFS use BH_Unwritten and BH_Delay correctly · 6ab8eb1c
      David Chinner 提交于
      Don't hide buffer_unwritten behind buffer_delay() and remove the hack that
      clears unexpected buffer_unwritten() states now that it can't happen.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Cc: Timothy Shimmin <tes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ab8eb1c
    • D
      [PATCH] Make BH_Unwritten a first class bufferhead flag V2 · 33a266dd
      David Chinner 提交于
      Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
      bufferhead.  Recently, I found the long standing mmap/unwritten extent
      conversion bug, and it was to do with partial page invalidation not clearing
      the unwritten flag from bufferheads attached to the page but beyond EOF.  See
      here for a full explaination:
      
      http://oss.sgi.com/archives/xfs/2006-12/msg00196.html
      
      The solution I have checked into the XFS dev tree involves duplicating code
      from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
      and then calling block_invalidatepage() to do the rest.
      
      Christoph suggested that this would be better solved by pushing the unwritten
      flag into the common buffer head flags and just adding the call to
      discard_buffer():
      
      http://oss.sgi.com/archives/xfs/2006-12/msg00239.html
      
      The following patch makes BH_Unwritten a first class citizen.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33a266dd
  2. 12 2月, 2007 21 次提交