[Ksummit-2008-discuss] Suggested topic: possible

James Bottomley James.Bottomley at HansenPartnership.com
Tue Aug 5 09:28:04 PDT 2008


On Tue, 2008-08-05 at 16:16 +0100, David Woodhouse wrote:
> > The case in  PCs, with the complexity of SSD required bandwidth, makes
> > that pretty hard to get.
> >  
> > 
> > Such an implementation would work, maybe, but for sure will not give
> > optimized solution – performance & endurance.
> 
> I'm unconvinced. I see no fundamental reason why access to raw flash
> should hamper our performance. Traditionally it's been relatively slow,
> sure -- but that's largely because we've had crap flash controllers. The
> chipset on the initial revisions of the OLPC XO, for example, was giving
> us a maximum read speed of 3.5MiB/s. So we built a new controller which
> gave us DMA and decent hardware ECC, and an order of magnitude speedup.
> 
> As for endurance -- one of my reasons for _wanting_ to access raw flash
> is the reliability and endurance factor.
> 
> Typically, the devices with the translation layer built in have been
> considered 'disposable'. Things like CompactFlash to go in your camera,
> USB sticks to treat like floppies, etc. And it shows.
> 
> When we did powerfail testing on JFFS2, we also did the same testing on
> some CF-style devices. And while we could fix JFFS2, we found that the
> devices with their own internal translation layer would become corrupt
> and die, quite regularly. Even if we weren't actually writing to them
> while we pulled the power.
> 
> Because the translation layer was hidden inside the device, there was no
> way we could repair it -- and certainly we couldn't fix the bugs which
> caused the breakage. If we were lucky, we could do a low-level reformat
> of the device; mostly we just got to throw it away.
> 
> The 'Trim' proposal at least addresses one of the other major issues
> with the imposed layering, which is that the underlying translation
> layer would need to preserve obsolete data during garbage collection and
> wear levelling. But it doesn't fix _everything_. When the file system is
> allowed to see what's going on, it can combine the defragmentation and
> wear levelling operations, and ensure a 'naturally' balanced wear
> pattern. When the device is doing things underneath without the host
> system's knowledge, you just have to trust it.
> 
> And these things have demonstrated themselves to be fundamentally
> untrustworthy in the past -- although one would _hope_ that if we're
> starting to put them in laptops and use them for 'real' storage, that's
> going to improve.

It's one of those abstraction and layering issues: where should the FTL
be?

In the old days, it was thought you could make a cheaper spinning disk
by presenting it almost directly to the OS as IDE with direct control
over cylinders, heads and sectors.  Nowadays, I don't believe anyone
believes this is the correct thing to do ... even IDE drives are hedged
around with internal state models like SCSI ones and you have to ask
nicely even to get access to the error correction information, never
mind exactly controlling the head movements and geometry.

The point is that when drives were simple and electronics expensive, the
IDE let the OS control the head idea made a certain amount of sense.
When the onboard electronics became cheap, and the limitations of having
the OS control the head were truly exposed, all disks suddenly became
linear arrays of sectors.  I have a suspicion that the same analogy will
apply to the FTL.  The OS may be able to do a better job now, but in the
end, the black box paradigm of a simple linear array of sectors will
prove easier.

I think trim makes sense whichever way we go (either it helps the linux
FTL manage the blocks, or it provides a better paradigm for the black
box FTL inside the device).

Incidentally, trim is now "Data Set Management" and this is the latest
proposal for ATA8:

http://www.t13.org/Documents/UploadedDocuments/docs2008/e07154r6-Data_Set_Management_Proposal_for_ATA-ACS2.pdf

James




More information about the Ksummit-2008-discuss mailing list