1. 04 9月, 2008 12 次提交
    • G
      dccp: Implement both feature-local and feature-remote Sequence Window feature · 51c7d4fa
      Gerrit Renker 提交于
      This adds full support for local/remote Sequence Window feature, from which the 
        * sequence-number-validity (W) and 
        * acknowledgment-number-validity (W') windows 
      derive as specified in RFC 4340, 7.5.3. 
      
      Specifically, the following changes are introduced:
        * integrated new socket fields into dccp_sk;
        * updated the update_gsr/gss routines with regard to these fields;
        * updated handler code: the Sequence Window feature is located at the TX side,
          so the local feature is meant if the handler-rx flag is false;
        * the initialisation of `rcv_wnd' in reqsk is removed, since
          - rcv_wnd is not used by the code anywhere;
          - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
          - dccp_check_req checks the Ack number validity more rigorously;
        * the `struct dccp_minisock' became empty and is now removed.
      
      Until the handshake completes with activating negotiated values, the local/remote
      Sequence-Window values are undefined and thus can not reliably be estimated.
      This issue is addressed in a separate patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      51c7d4fa
    • G
      dccp: Initialisation framework for feature negotiation · 5d3dac26
      Gerrit Renker 提交于
      This initialises feature negotiation from two tables, which are initialised
      from sysctls. 
      
      As a novel feature, specifics of the implementation (e.g. currently short
      seqnos and ECN are not supported) are advertised for robustness.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      5d3dac26
    • G
      dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · b235dc4a
      Gerrit Renker 提交于
      This removes the use of the sysctl and the minisock variable for the Send Ack
      Vector feature, which is now handled fully dynamically via feature negotiation;
      i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
      RFC 4341, 4.).
      
      Using a sysctl in parallel to this implementation would open the door to
      crashes, since much of the code relies on tests of the boolean minisock /
      sysctl variable. Thus, this patch replaces all tests of type
      
      	if (dccp_msk(sk)->dccpms_send_ack_vector)
      		/* ... */
      with
      	if (dp->dccps_hc_rx_ackvec != NULL)
      		/* ... */
      
      The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
      negotiation concluded that Ack Vectors are to be used on the half-connection.
      Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
      so that the test is a valid one.
      
      The activation handler for Ack Vectors is called as soon as the feature
      negotiation has concluded at the
       * server when the Ack marking the transition RESPOND => OPEN arrives;
       * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
      
      Adding the sequence number of the Response packet to the Ack Vector has been 
      removed, since
       (a) connection establishment implies that the Response has been received;
       (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
           this entry will always be ignored;
       (c) it can not be used for anything useful - to detect loss for instance, only
           packets received after the loss can serve as pseudo-dupacks.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      b235dc4a
    • G
      dccp: Remove manual influence on NDP Count feature · 68e074bf
      Gerrit Renker 提交于
      Updating the NDP count feature is handled automatically now:
       * for CCID-2 it is disabled, since the code does not use NDP counts;
       * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.
      
      Allowing the user to change NDP values leads to unpredictable and failing
      behaviour, since it is then possible to disable NDP counts even when they
      are needed (e.g. in CCID-3).
      
      This means that only those user settings are sensible that agree with the
      values for Send NDP Count implied by the choice of CCID. But those settings
      are already activated by the feature negotiation (CCID dependency tracking),
      hence this form of support is redundant.
      
      At startup the initialisation of the NDP count feature is with the default
      value of 0, which is done implicitly by the zeroing-out of the socket when
      it is allocated. If the choice of CCID or feature negotiation enables NDP
      count, this will then be updated via the NDP activation handler.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      68e074bf
    • G
      dccp: Remove obsolete parts of the old CCID interface · 78673e24
      Gerrit Renker 提交于
      The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
      case, their value equals initially that of the sysctl, but at the end of
      feature negotiation may be something different.
      
      The old interface removed by this patch thus has been replaced by the newer
      interface to dynamically query the currently loaded CCIDs earlier in this
      patch set.
      
      Also removed the constructors for the TX CCID and the RX CCID, since the
      switch rx/non-rx is done by the handler in minisocks.c (and the handler is
      the only place in the code where CCIDs are loaded).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      78673e24
    • G
      dccp: Set per-connection CCIDs via socket options · fade756f
      Gerrit Renker 提交于
      With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
      overrides the defaults set by the global sysctl variables for TX/RX CCIDs.
      
      To make full use of this facility, the remaining patches of this patch set are
      needed, which track dependencies and activate negotiated feature values.
      
      Note on the maximum number of CCIDs that can be registered:
      -----------------------------------------------------------
      The maximum number of CCIDs that can be registered on the socket is constrained
      by the space in a Confirm/Change feature negotiation option. 
      
      The space in these in turn depends on the size of header options as defined
      in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
      ackvec.h into linux/dccp.h, clarifying its purpose.
      
      Relative to this size, the maximum number of CCID identifiers that can be 
      present in a Confirm option (which always consumes 1 byte more than a Change
      option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
      CCID-feature-type and one for the selected value.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      fade756f
    • G
      dccp: Deprecate Ack Ratio sysctl · 17c30b40
      Gerrit Renker 提交于
      This patch deprecates the Ack Ratio sysctl, since
       * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
       * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
       * even if it would work in CCID-2, there is no point for a user to change it:
         - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
         - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts 
           (since waiting for Acks which will never arrive in this window),
         - cwnd is not a user-configurable value.	
      
      The only reasonable place for Ack Ratio is to print it for debugging. It is
      planned to do this later on, as part of e.g. dccp_probe.
      
      With this patch Ack Ratio is now under full control of feature negotiation:
       * Ack Ratio is resolved as a dependency of the selected CCID;
       * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
         the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
         Ratio 2 for both endpoints";
       * what happens then is part of another patch set, since it concerns the 
         dynamic update of Ack Ratio while the connection is in full flight.
      
      Thanks to Tomasz Grobelny for discussion leading up to this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      17c30b40
    • G
      dccp: Feature negotiation for minimum-checksum-coverage · 20f41eee
      Gerrit Renker 提交于
      This provides feature negotiation for server minimum checksum coverage
      which so far has been missing.
      
      Since sender/receiver coverage values range only from 0...15, their
      type has also been reduced in size from u16 to u4.
      
      Feature-negotiation options are now generated for both sender and receiver
      coverage, i.e. when the peer has `forgotten' to enable partial coverage
      then feature negotiation will automatically enable (negotiate) the partial
      coverage value for this connection.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      20f41eee
    • G
      dccp: Deprecate old setsockopt framework · 668144f7
      Gerrit Renker 提交于
      The previous setsockopt interface, which passed socket options via struct 
      dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
      ugly code since the old approach did not distinguish between NN and SP values.
      
      This patch removes the old setsockopt interface and replaces it with two new
      functions to register NN/SP values for feature negotiation. These are 
      essentially wrappers around the internal __feat_register functions, with 
      checking added to avoid
       * wrong usage (type);
       * changing values while the connection is in progress.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      668144f7
    • G
      dccp: Query supported CCIDs · 71bb4959
      Gerrit Renker 提交于
      This provides a data structure to record which CCIDs are locally supported
      and three accessor functions:
       - a test function for internal use which is used to validate CCID requests
         made by the user;
       - a copy function so that the list can be used for feature-negotiation;   
       - documented getsockopt() support so that the user can query capabilities.
      
      The data structure is a table which is filled in at compile-time with the
      list of available CCIDs (which in turn depends on the Kconfig choices).
      
      Using the copy function for cloning the list of supported CCIDs is useful for
      feature negotiation, since the negotiation is now with the full list of available
      CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation 
      will not fail if the peer requests to use CCID3 instead of CCID2. 
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      71bb4959
    • G
      dccp: Per-socket initialisation of feature negotiation · 828755ce
      Gerrit Renker 提交于
      This provides feature-negotiation initialisation for both DCCP sockets and
      DCCP request_sockets, to support feature negotiation during connection setup.
      
      It also resolves a FIXME regarding the congestion control initialisation.
      
      Thanks to Wei Yongjun for help with the IPv6 side of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      828755ce
    • G
      dccp: Implement lookup table for feature-negotiation information · b4eec206
      Gerrit Renker 提交于
      A lookup table for feature-negotiation information, extracted from RFC 4340/42,
      is provided by this patch. All currently known features can be found in this 
      table, along with their feature location, their default value, and type.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      b4eec206
  2. 30 8月, 2008 1 次提交
    • D
      net: Unbreak userspace usage of linux/mroute.h · 7c19a3d2
      David S. Miller 提交于
      Nothing in linux/pim.h should be exported to userspace.
      
      This should fix the XORP build failure reported by
      Jose Calhariz, the debain package maintainer.
      
      Nothing originally in linux/mroute.h was exported to userspace
      ever, but some of this stuff started to be when it was moved into
      this new linux/pim.h, and that was wrong.  If we didn't provide these
      definitions for 10 years we can reasonably expect that applications
      defined this stuff locally or used GLIBC headers providing the
      protocol definitions.  And as such the only result of this can
      be conflict and userland build breakage.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c19a3d2
  3. 21 8月, 2008 4 次提交
    • I
      fbdefio: add set_page_dirty handler to deferred IO FB · d847471d
      Ian Campbell 提交于
      Fixes kernel BUG at lib/radix-tree.c:473.
      
      Previously the handler was incidentally provided by tmpfs but this was
      removed with:
      
        commit 14fcc23f
        Author: Hugh Dickins <hugh@veritas.com>
        Date:   Mon Jul 28 15:46:19 2008 -0700
      
          tmpfs: fix kernel BUG in shmem_delete_inode
      
      relying on this behaviour was incorrect in any case and the BUG also
      appeared when the device node was on an ext3 filesystem.
      
      v2: override a_ops at open() time rather than mmap() time to minimise
      races per AKPM's concerns.
      Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
      Cc: Jaya Kumar <jayakumar.lkml@gmail.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Kel Modderman <kel@otaku42.de>
      Cc: Markus Armbruster <armbru@redhat.com>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Cc: <stable@kernel.org> [14fcc23f is in 2.6.25.14 and 2.6.26.1]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d847471d
    • N
      mm: dirty page tracking race fix · 479db0bf
      Nick Piggin 提交于
      There is a race with dirty page accounting where a page may not properly
      be accounted for.
      
      clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
      
      page_mkclean walks the rmaps for that page, and for each one it cleans and
      write protects the pte if it was dirty.  It uses page_check_address to
      find the pte.  That function has a shortcut to avoid the ptl if the pte is
      not present.  Unfortunately, the pte can be switched to not-present then
      back to present by other code while holding the page table lock -- this
      should not be a signal for page_mkclean to ignore that pte, because it may
      be dirty.
      
      For example, powerpc64's set_pte_at will clear a previously present pte
      before setting it to the desired value.  There may also be other code in
      core mm or in arch which do similar things.
      
      The consequence of the bug is loss of data integrity due to msync, and
      loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
      also be unreliable (depending on the exact XIP locking scheme), which can
      lead to data corruption.
      
      Fix this by having an option to always take ptl to check the pte in
      page_check_address.
      
      It's possible to retain this optimization for page_referenced and
      try_to_unmap.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Carsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      479db0bf
    • K
      fix setpriority(PRIO_PGRP) thread iterator breakage · 2d70b68d
      Ken Chen 提交于
      When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
      process, only the task leader of the process is affected, all other
      sibling LWP threads didn't receive the setting.  The problem was that the
      iterator used in sys_setpriority() only iteartes over one task for each
      process, ignoring all other sibling thread.
      
      Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
      each thread of a process.  Convert 4 call sites in {set/get}priority and
      ioprio_{set/get}.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d70b68d
    • D
      Reserve NFS fileid values for btrfs · e4464fac
      David Woodhouse 提交于
      Purely cosmetic for now, but we might as well get it merged ASAP.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4464fac
  4. 19 8月, 2008 4 次提交
  5. 17 8月, 2008 3 次提交
  6. 16 8月, 2008 4 次提交
  7. 15 8月, 2008 5 次提交
  8. 14 8月, 2008 6 次提交
    • D
      security: Fix setting of PF_SUPERPRIV by __capable() · 5cd9c58f
      David Howells 提交于
      Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
      the target process if that is not the current process and it is trying to
      change its own flags in a different way at the same time.
      
      __capable() is using neither atomic ops nor locking to protect t->flags.  This
      patch removes __capable() and introduces has_capability() that doesn't set
      PF_SUPERPRIV on the process being queried.
      
      This patch further splits security_ptrace() in two:
      
       (1) security_ptrace_may_access().  This passes judgement on whether one
           process may access another only (PTRACE_MODE_ATTACH for ptrace() and
           PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
           current is the parent.
      
       (2) security_ptrace_traceme().  This passes judgement on PTRACE_TRACEME only,
           and takes only a pointer to the parent process.  current is the child.
      
           In Smack and commoncap, this uses has_capability() to determine whether
           the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
           This does not set PF_SUPERPRIV.
      
      Two of the instances of __capable() actually only act on current, and so have
      been changed to calls to capable().
      
      Of the places that were using __capable():
      
       (1) The OOM killer calls __capable() thrice when weighing the killability of a
           process.  All of these now use has_capability().
      
       (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
           whether the parent was allowed to trace any process.  As mentioned above,
           these have been split.  For PTRACE_ATTACH and /proc, capable() is now
           used, and for PTRACE_TRACEME, has_capability() is used.
      
       (3) cap_safe_nice() only ever saw current, so now uses capable().
      
       (4) smack_setprocattr() rejected accesses to tasks other than current just
           after calling __capable(), so the order of these two tests have been
           switched and capable() is used instead.
      
       (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
           receive SIGIO on files they're manipulating.
      
       (6) In smack_task_wait(), we let a process wait for a privileged process,
           whether or not the process doing the waiting is privileged.
      
      I've tested this with the LTP SELinux and syscalls testscripts.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Acked-by: NAndrew G. Morgan <morgan@kernel.org>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      5cd9c58f
    • F
      usb: musb: pass configuration specifics via pdata · ca6d1b13
      Felipe Balbi 提交于
      Use platform_data to pass musb configuration-specific
      details to musb driver.
      
      This patch will prevent that other platforms selecting
      HAVE_CLK and enabling musb won't break tree building.
      
      The other parts of it will come when linux-omap merge
      up more omap2/3 board-files.
      Signed-off-by: NFelipe Balbi <felipe.balbi@nokia.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      ca6d1b13
    • F
      USB: Add MUSB and TUSB support · 550a7375
      Felipe Balbi 提交于
      This patch adds support for MUSB and TUSB controllers
      integrated into omap2430 and davinci. It also adds support
      for external tusb6010 controller.
      
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NFelipe Balbi <felipe.balbi@nokia.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      550a7375
    • A
      usb-serial: don't release unregistered minors · 0282b7f2
      Alan Stern 提交于
      This patch (as1121) fixes a bug in the USB serial core.  When a device
      is unregistered, the core will give back its minors -- even if the
      device hasn't been assigned any!
      
      The patch reserves the highest minor value (255) to mean that no minor
      was assigned.  It also removes some dead code and does a small style
      fixup.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Cc: stable <stable@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0282b7f2
    • A
      USB: add missing kerneldoc line for "needs_binding" · f4f4d587
      Alan Stern 提交于
      This patch (as1117) adds a kerneldoc line for the "needs_binding"
      field in struct usb_interface.  It was accidentally omitted when the
      field was added.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      f4f4d587
    • D
      CRED: Introduce credential access wrappers · 9e2b2dc4
      David Howells 提交于
      The patches that are intended to introduce copy-on-write credentials for 2.6.28
      require abstraction of access to some fields of the task structure,
      particularly for the case of one task accessing another's credentials where RCU
      will have to be observed.
      
      Introduced here are trivial no-op versions of the desired accessors for current
      and other tasks so that other subsystems can start to be converted over more
      easily.
      
      Wrappers are introduced into a new header (linux/cred.h) for UID/GID,
      EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed
      user_struct.  These wrappers are macros because the ordering between header
      files mitigates against making them inline functions.
      
      linux/cred.h is #included from linux/sched.h.
      
      Further, XFS is modified such that it no longer defines and uses parameterised
      versions of current_fs[ug]id(), thus getting rid of the namespace collision
      otherwise incurred.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      9e2b2dc4
  9. 13 8月, 2008 1 次提交
    • D
      [XFS] extend completions to provide XFS object flush requirements · 39d2f1ab
      David Chinner 提交于
      XFS object flushing doesn't quite match existing completion semantics.  It
      mixed exclusive access with completion.  That is, we need to mark an object as
      being flushed before flushing it to disk, and then block any other attempt to
      flush it until the completion occurs.  We do this but adding an extra count to
      the completion before we start using them.  However, we still need to
      determine if there is a completion in progress, and allow no-blocking attempts
      fo completions to decrement the count.
      
      To do this we introduce:
      
      int try_wait_for_completion(struct completion *x)
      	returns a failure status if done == 0, otherwise decrements done
      	to zero and returns a "started" status. This is provided
      	to allow counted completions to begin safely while holding
      	object locks in inverted order.
      
      int completion_done(struct completion *x)
      	returns 1 if there is no waiter, 0 if there is a waiter
      	(i.e. a completion in progress).
      
      This replaces the use of semaphores for providing this exclusion
      and completion mechanism.
      
      SGI-PV: 981498
      
      SGI-Modid: xfs-linux-melb:xfs-kern:31816a
      Signed-off-by: NDavid Chinner <david@fromorbit.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      39d2f1ab