[Ksummit-2008-discuss] Suggested topic: possible

David Woodhouse dwmw2 at infradead.org
Tue Aug 5 08:16:17 PDT 2008


On Tue, 2008-08-05 at 07:04 +0300, Eyal Shani wrote:
> 
> For a second there I thought u meant SSD management should be left for
> the SSDs themselves.
> 
> However, I then continued reading, and understood you feel the
> opposite – you suggest writing FS for RAW flash..?!

Absolutely. I already wrote one: http://david.woodhou.se/jffs2.pdf

That's a few years old now, and its design target was something like
32MiB of NOR flash -- so it's fairly much due for replacement. We now
have alternatives such as Nokia's UBIFS:
http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf
and Jörn Engel's "logfs": http://logfs.org/logfs/

I'm also plotting to make btrfs work on raw flash too. Its wandering
tree design should work quite nicely, and I think there'll be benefits
from using a 'mainline' file system.

> This have been tried a few years ago, in the Cellular business, and
> worked OK for SLC - just until MLC came to ruin the dream.

Yeah, MLC certainly makes life 'interesting'. We can cope with it
though. JFFS2 doesn't yet -- mostly because I've been hoping that MLC
_is_ just a bad dream, especially the "you write to page A and lose
power, and page B elsewhere on the flash gets corrupted" parts...

> The case in  PCs, with the complexity of SSD required bandwidth, makes
> that pretty hard to get.
>  
> 
> Such an implementation would work, maybe, but for sure will not give
> optimized solution – performance & endurance.

I'm unconvinced. I see no fundamental reason why access to raw flash
should hamper our performance. Traditionally it's been relatively slow,
sure -- but that's largely because we've had crap flash controllers. The
chipset on the initial revisions of the OLPC XO, for example, was giving
us a maximum read speed of 3.5MiB/s. So we built a new controller which
gave us DMA and decent hardware ECC, and an order of magnitude speedup.

As for endurance -- one of my reasons for _wanting_ to access raw flash
is the reliability and endurance factor.

Typically, the devices with the translation layer built in have been
considered 'disposable'. Things like CompactFlash to go in your camera,
USB sticks to treat like floppies, etc. And it shows.

When we did powerfail testing on JFFS2, we also did the same testing on
some CF-style devices. And while we could fix JFFS2, we found that the
devices with their own internal translation layer would become corrupt
and die, quite regularly. Even if we weren't actually writing to them
while we pulled the power.

Because the translation layer was hidden inside the device, there was no
way we could repair it -- and certainly we couldn't fix the bugs which
caused the breakage. If we were lucky, we could do a low-level reformat
of the device; mostly we just got to throw it away.

The 'Trim' proposal at least addresses one of the other major issues
with the imposed layering, which is that the underlying translation
layer would need to preserve obsolete data during garbage collection and
wear levelling. But it doesn't fix _everything_. When the file system is
allowed to see what's going on, it can combine the defragmentation and
wear levelling operations, and ensure a 'naturally' balanced wear
pattern. When the device is doing things underneath without the host
system's knowledge, you just have to trust it.

And these things have demonstrated themselves to be fundamentally
untrustworthy in the past -- although one would _hope_ that if we're
starting to put them in laptops and use them for 'real' storage, that's
going to improve.

> There are just too many different internal architecture out there,
> with different focus & flash technologies, to have SW deal with it.

I think you underestimate the flexibility and adaptability of Linux.
We're actually quite good at coping, relatively efficiently, with just
about anything the hardware guys can throw at us.

When we've finished laughing, that is. Or crying.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse at intel.com                              Intel Corporation





More information about the Ksummit-2008-discuss mailing list