[Ksummit-2012-discuss] [ATTEND] "Modularize" built-in componets to expunge them if they are unnecessary; Wither the baseline attendance.

Konrad Rzeszutek Wilk konrad at darnok.org
Mon Jun 18 16:05:57 UTC 2012


On Sun, Jun 17, 2012 at 11:48 PM, H. Peter Anvin <hpa at zytor.com> wrote:
> On 06/17/2012 12:28 PM, Konrad Rzeszutek Wilk wrote:
>> Hello All,
>>
>> Would like to discuss a mechanism to "modularize" built-in components. Meaning that
>> drivers/subsystems that are built in with CONFIG_XX=y if they fail to start
>> (say AMD IOMMU on Intel hardware) anything but zero are expunged from the .text section.
>> Specifically I want to discuss various methods this can be achieved for this:
>> 1) make various early stage drivers behave as modules and load them the same way-ish
>> (hand-waving) as modules. 2) compaction and re-linking of various text-section around
>> the codes that did not get loaded 3) other ways?
>>
>
> Funny enough, I actually discussed this exact thing with Linus over 10
> years ago; it might be close to 15 years ago now.
>
> The idea was basically "pre-linked dynamic modules", however, Linus
> wanted them allocated out of the direct map rather than vmalloc space
> like runtime modules -- this is doable since it is a static allocation
> and we just punch holes.
>
> So far so good.  This is where it gets ugly:
>
> 1. Internal fragmentation.  If each module pads .text/.rodata/.data to
> page boundaries, there can be quite a bit of memory lost.

Could runtime symbolic re-linking of symbols/addresses address this? And
not sticking the modules on page boundaries (at least for .text) but right next
to each other? Or is that a really bad idea due to the CPU potentially fetching
cold .text cache pages along with .hot pages?

For .rodata/.bss sections that are cold - I think packing them tighly
would be OK?
While the hot .rodata/.bss/.data should get some breathing space for CPU
cache? Perhaps it should be not be based on page boundaries but on
cache alignment?

>
> 2. Messes with R/RW/RX separation.  The holes being created by freeing
> modules means large pages get broken up, adding to TLB pressure.

Can't the freed space be re-used? I was thinking of the kernel loading
these internal modules (.moduleX.text, .moduleX.bss, etc) right at the
end of the .text section (so __end), and the .bss at the end of the moving
.bss virtual address counter. Then if the module is succesfully
loaded, increment
the __end to end of the module and align it to the cache align location.
Ditto for the .bss.

If we fail to load the module, we just re-use that virtual space.

[heavy hand-waving] The pre-allocated space for where those modules would
be located would be still a PMD - and the kernel would determine
during build-time
what it thinks is the max size. And then later one when we have gone through
all of the modules and only used up 70% of the max size we can use the other
parts for other stuff (thinking .bss here).

>
> The good part of this method is that being pre-linked, these modules do
> not add significantly to the boot time.  Of course, this assumes someone
> actually feels motivated to implement them, which hasn't happened yet.

I am feeling brave enough to sign up for this (and I am more than happy to
divvy up the work if there are any other folks who would want jump in the boat
with me) - but I would love to sit down and brainstorm the potential
pitfalls that
are going to appear.
>
> There are other methods, of course, and some might have different
> tradeoffs; in particular something that explicitly creates a compact set
> of in-use modules might have lower runtime overhead, but probably a
> higher boot time penalty.

Don't you mean the other way around? You would have a lower boot time
penalty b/c you would only load an smaller set of modules? And during
runtime of the userspace udev would go through and try to load whatever
else it can think of?


More information about the Ksummit-2012-discuss mailing list