[linux-pm] [RFC PATCH 0/4] timers: framework for migration between CPU

Ingo Molnar mingo at elte.hu
Fri Feb 20 05:21:45 PST 2009


* Arun R Bharadwaj <arun at linux.vnet.ibm.com> wrote:

> Hi,
> 
> 
> In an SMP system, tasks are scheduled on different CPUs by the 
> scheduler, interrupts are managed by irqbalancer daemon, but 
> timers are still stuck to the CPUs that they have been 
> initialised.  Timers queued by tasks gets re-queued on the CPU 
> where the task gets to run next, but timers from IRQ context 
> like the ones in device drivers are still stuck on the CPU 
> they were initialised.  This framework will help move all 
> 'movable timers' from one CPU to any other CPU of choice using 
> a sysfs interface.

hm, the intention is good, the concept of migrating timers to 
their target CPU is good as well. We already do some of that for 
regular timers.

But the whole sysfs interface you implemented here is not 
particularly clean nor is it efficient.

The main problem is that timers are really fast-moving entities, 
and so are the tasks they are related to.

Your implementation completely ties the direction of migration 
(the timer scheduling) to a clumsy sysfs interface:

+	if (sscanf(buf, "%d", &target_cpu) && cpu_online(target_cpu)) {
+               ret = count;
+               per_cpu(enable_timer_migration, cpu->sysdev.id) = target_cpu;
+	}

That doesnt really scale and i doubt it works in practice. We 
should not schedule timers via sysfs, we should let the kernel 
do it auomatically. [*]

So what i'd suggest instead is extend the scheduler power-saving 
code, which already identifies a 'load balancer CPU', to also 
attract all attractable sources of timers - automatically. See 
the 'load_balancer' CPU logic in kernel/sched.c.

Does that sound OK to you? I think the end result might even 
give better numbers - and out of box.

I'd also suggest to not do that rather ugly 
enable_timer_migration per-cpu variable, but simply reuse the 
existing nohz.load_balancer as a target CPU.

Also, please base your patches on the latest timer tree (which 
already modified some of this code in this cycle):

  http://people.redhat.com/mingo/tip.git/README

Btw., could you please also fix your mailer to not do this to 
us:

Mail-Followup-To: linux-kernel at vger.kernel.org,
        linux-pm at lists.linux-foundation.org, a.p.zijlstra at chello.nl,
        ego at in.ibm.com, tglx at linutronix.de, mingo at elte.hu,
        andi at firstfloor.org, venkatesh.pallipadi at intel.com,
        vatsa at linux.vnet.ibm.com, arjan at infradead.org

it messes up the replies.

	Ingo

[*] IRQ migration (where you possibly got the sysfs idea from) 
    is a special case where 'slow scheduling' via a user-space 
    daemon is possible: they are an external source of events 
    and they are concentrators of work. The same concept does 
    not apply to timers, most of which are inherently 
    task-generated.



More information about the linux-pm mailing list