1. 01 4月, 2006 23 次提交
  2. 31 3月, 2006 3 次提交
    • A
      [NET]: Allow skb headroom to be overridden · 025be81e
      Anton Blanchard 提交于
      Previously we added NET_IP_ALIGN so an architecture can override the
      padding done to align headers. The next step is to allow the skb
      headroom to be overridden.
      
      We currently always reserve 16 bytes to grow into, meaning all DMAs
      start 16 bytes into a cacheline. On ppc64 we really want DMA writes to
      start on a cacheline boundary, so we increase that headroom to one
      cacheline.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      025be81e
    • J
      [PATCH] splice: add support for SPLICE_F_MOVE flag · 5abc97aa
      Jens Axboe 提交于
      This enables the caller to migrate pages from one address space page
      cache to another.  In buzz word marketing, you can do zero-copy file
      copies!
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5abc97aa
    • J
      [PATCH] Introduce sys_splice() system call · 5274f052
      Jens Axboe 提交于
      This adds support for the sys_splice system call. Using a pipe as a
      transport, it can connect to files or sockets (latter as output only).
      
      From the splice.c comments:
      
         "splice": joining two ropes together by interweaving their strands.
      
         This is the "extended pipe" functionality, where a pipe is used as
         an arbitrary in-memory buffer. Think of a pipe as a small kernel
         buffer that you can use to transfer data from one end to the other.
      
         The traditional unix read/write is extended with a "splice()" operation
         that transfers data buffers to or from a pipe buffer.
      
         Named by Larry McVoy, original implementation from Linus, extended by
         Jens to support splicing to files and fixing the initial implementation
         bugs.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5274f052
  3. 30 3月, 2006 4 次提交
    • A
      [PATCH] libata: Simplex and other mode filtering logic · 5444a6f4
      Alan Cox 提交于
      Add a field to the host_set called 'flags' (was host_set_flags changed
      to suit Jeff)
      Add a simplex_claimed field so we can remember who owns the DMA channel
      Add a ->mode_filter() hook to allow drivers to filter modes
      Add docs for mode_filter and set_mode
      Filter according to simplex state
      Filter cable in core
      
      This provides the needed framework to support all the mode rules found
      in the PATA world. The simplex filter deals with 'to spec' simplex DMA
      systems found in older chips. The cable filter avoids duplicating the
      same rules in each chip driver with PATA. Finally the mode filter is
      neccessary because drive/chip combinations have errata that forbid
      certain modes with some drives or types of ATA object.
      
      Drive speed setup remains per channel for now and the filters now use
      the framework Tejun put into place which cleans them up a lot from the
      older libata-pata patches.
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      5444a6f4
    • A
      [PATCH] libata: Add ->set_mode hook for odd drivers · e35a9e01
      Alan Cox 提交于
      Some hardware doesn't want the usual mode setup logic running. This
      allows the hardware driver to replace it for special cases in the least
      invasive way possible.
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      e35a9e01
    • A
      [PATCH] libata: BMDMA handling updates · 4e5ec5db
      Alan Cox 提交于
      This is the minimal patch set to enable the current code to be used with
      a controller following SFF (ie any PATA and early SATA controllers)
      safely without crashes if there is no BMDMA area or if BMDMA is not
      assigned by the BIOS for some reason.
      
      Simplex status is recorded but not acted upon in this change, this isn't
      a problem with the current drivers as none of them are for simplex
      hardware. A following diff will deal with that.
      
      The flags in the probe structure remain ->host_set_flags although Jeff
      asked me to rename them, simply because the rename would break the usual
      Linux rules that old code should break when there are changes. not
      compile and run and then blow up/eat your computer/etc. Renaming this
      later is a trivial exercise once a better name is chosen.
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      4e5ec5db
    • D
      [NET]: Deinline some larger functions from netdevice.h · 56079431
      Denis Vlasenko 提交于
      On a allyesconfig'ured kernel:
      
      Size  Uses Wasted Name and definition
      ===== ==== ====== ================================================
         95  162  12075 netif_wake_queue      include/linux/netdevice.h
        129   86   9265 dev_kfree_skb_any     include/linux/netdevice.h
        127   56   5885 netif_device_attach   include/linux/netdevice.h
         73   86   4505 dev_kfree_skb_irq     include/linux/netdevice.h
         46   60   1534 netif_device_detach   include/linux/netdevice.h
        119   16   1485 __netif_rx_schedule   include/linux/netdevice.h
        143    5    492 netif_rx_schedule     include/linux/netdevice.h
         81    7    366 netif_schedule        include/linux/netdevice.h
      
      netif_wake_queue is big because __netif_schedule is a big inline:
      
      static inline void __netif_schedule(struct net_device *dev)
      {
              if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
                      unsigned long flags;
                      struct softnet_data *sd;
      
                      local_irq_save(flags);
                      sd = &__get_cpu_var(softnet_data);
                      dev->next_sched = sd->output_queue;
                      sd->output_queue = dev;
                      raise_softirq_irqoff(NET_TX_SOFTIRQ);
                      local_irq_restore(flags);
              }
      }
      
      static inline void netif_wake_queue(struct net_device *dev)
      {
      #ifdef CONFIG_NETPOLL_TRAP
              if (netpoll_trap())
                      return;
      #endif
              if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
                      __netif_schedule(dev);
      }
      
      By de-inlining __netif_schedule we are saving a lot of text
      at each callsite of netif_wake_queue and netif_schedule.
      __netif_rx_schedule is also big, and it makes more sense to keep
      both of them out of line.
      
      Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
      instead... oh well.
      
      netif_device_attach/detach are not hot paths, we can deinline them too.
      Signed-off-by: NDenis Vlasenko <vda@ilport.com.ua>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56079431
  4. 29 3月, 2006 10 次提交
    • O
      [PATCH] cleanup __exit_signal->cleanup_sighand path · a7e5328a
      Oleg Nesterov 提交于
      Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal().  This
      makes the exit path more understandable and allows us to do
      cleanup_sighand() outside of ->siglock protected section.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a7e5328a
    • O
      [PATCH] pids: kill PIDTYPE_TGID · 47e65328
      Oleg Nesterov 提交于
      This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
      kernel/pid.c and speeding up subthreads create/destroy a bit.  It is also a
      preparation for the further tref/pids rework.
      
      This patch adds 'struct list_head thread_group' to 'struct task_struct'
      instead.
      
      We don't detach group leader from PIDTYPE_PID namespace until another
      thread inherits it's ->pid == ->tgid, so we are safe wrt premature
      free_pidmap(->tgid) call.
      
      Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
      Should the need arise, we can use find_task_by_pid()->group_leader.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-By: NEric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      47e65328
    • O
      [PATCH] move __exit_signal() to kernel/exit.c · 6a14c5c9
      Oleg Nesterov 提交于
      __exit_signal() is private to release_task() now.  I think it is better to
      make it static in kernel/exit.c and export flush_sigqueue() instead - this
      function is much more simple and straightforward.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6a14c5c9
    • O
      [PATCH] rename __exit_sighand to cleanup_sighand · c81addc9
      Oleg Nesterov 提交于
      Cosmetic, rename __exit_sighand to cleanup_sighand and move it close to
      copy_sighand().
      
      This matches copy_signal/cleanup_signal naming, and I think it is easier to
      follow.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c81addc9
    • O
      [PATCH] copy_process: cleanup bad_fork_cleanup_signal · 6b3934ef
      Oleg Nesterov 提交于
      __exit_signal() does important cleanups atomically under ->siglock.  It is
      also called from copy_process's error path.  This is not good, for example we
      can't move __unhash_process() under ->siglock for that reason.
      
      We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under
      'bad_fork_cleanup_sighand:' label.  For copy_process() case it is sufficient
      to just backout copy_signal(), nothing more.
      
      Again, nobody can see this task yet.  For CLONE_THREAD case we just decrement
      signal->count, otherwise nobody can see this ->signal and we can free it
      lockless.
      
      This patch assumes it is safe to do exit_thread_group_keys() without
      tasklist_lock.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6b3934ef
    • O
      [PATCH] copy_process: cleanup bad_fork_cleanup_sighand · 7001510d
      Oleg Nesterov 提交于
      The only caller of exit_sighand(tsk) is copy_process's error path.  We can
      call __exit_sighand() directly and kill exit_sighand().
      
      This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no
      external references, nobody can see it, and
      
      	IF (clone_flags & CLONE_SIGHAND)
      		At least 'current' has a reference to ->sighand, this
      		means atomic_dec_and_test(sighand->count) can't be true.
      
      	ELSE
      		Nobody can see this ->sighand, this means we can free it
      		without any locking.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7001510d
    • O
      [PATCH] introduce lock_task_sighand() helper · f63ee72e
      Oleg Nesterov 提交于
      Add lock_task_sighand() helper and converts group_send_sig_info() to use
      it.  Hopefully we will have more users soon.
      
      This patch also removes '!sighand->count' and '!p->usage' checks, I think
      they both are bogus, racy and unneeded (but probably it makes sense to
      restore them as BUG_ON()s).
      
      ->sighand is cleared and it's ->count is decremented in release_task() with
      sighand->siglock held, so it is a bug to have '!p->usage || !->count' after
      we already locked and verified it is the same.  On the other hand, an
      already dead task without ->sighand can have a non-zero ->usage due to
      ptrace, for example.
      
      If we read the stale value of ->sighand we must see the change after
      spin_lock(), because that change was done while holding that same old
      ->sighand.siglock.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f63ee72e
    • O
      [PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU · aa1757f9
      Oleg Nesterov 提交于
      This patch borrows a clever Hugh's 'struct anon_vma' trick.
      
      Without tasklist_lock held we can't trust task->sighand until we locked it
      and re-checked that it is still the same.
      
      But this means we don't need to defer 'kmem_cache_free(sighand)'.  We can
      return the memory to slab immediately, all we need is to be sure that
      sighand->siglock can't dissapear inside rcu protected section.
      
      To do so we need to initialize ->siglock inside ctor function,
      SLAB_DESTROY_BY_RCU does the rest.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aa1757f9
    • O
      [PATCH] pidhash: don't use zero pids · c7c64641
      Oleg Nesterov 提交于
      daemonize() calls set_special_pids(1,1), while init and kernel threads spawned
      from init/main.c:init() run with 0,0 special pids.  This patch changes
      INIT_SIGNALS() so that that they run with ->pgrp == ->session == 1 also.  This
      patch relies on fact that swapper's pid == 1.
      
      Now we have no hashed zero pids in pid_hash[].
      
      User-space visibible change is that now /sbin/init runs with (1,1) special
      pids and becomes a session leader.
      
      Quoting Eric W. Biederman:
      >
      > daemonize consuming pids (1,1) then consumes pgrp 1.  So that when
      > /sbin/init calls setsid() it thinks /sbin/init is a process group
      > leader and setsid() fails.  So /sbin/init wants pgrp 1 session 1
      > but doesn't get it.  I am pretty certain daemonize did not exist so
      > /sbin/init got pgrp 1 session 1 in 2.4.
      >
      > That is the bug that is being fixed.
      >
      > This patch takes things one step farther and essentially calls
      > setsid() for pid == 1 before init is execed.  That is new behavior
      > but it cleans up the kernel as we now do not need to support the
      > case of a process without a process group or a session.
      >
      > The only process that could have possibly cared was /sbin/init
      > and it already calls setsid() because it doesn't want that.
      >
      > If this was going to break anything noticeable the change in behavior
      > from 2.4 to 2.6 would have already done that.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7c64641
    • O
      [PATCH] pidhash: don't count idle threads · 73b9ebfe
      Oleg Nesterov 提交于
      fork_idle() does unhash_process() just after copy_process().  Contrary,
      boot_cpu's idle thread explicitely registers itself for each pid_type with nr
      = 0.
      
      copy_process() already checks p->pid != 0 before process_counts++, I think we
      can just skip attach_pid() calls and job control inits for idle threads and
      kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
      because with this patch idle threads are never hashed in
      kernel/pid.c:pid_hash[].
      
      We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
      called with pid == 0 arg, so it will never be reused.  So it is still possible
      to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.
      
      However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
      have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
      kernel threads which don't call daemonize().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      73b9ebfe