Now that Lori has her new HP TouchSmart for an upcoming holiday gift, we are finally digitizing our DVD collection. You would think that since our tastes are somewhat similar, we’d be good to go with a relatively small number of DVDs… We’re not. I’m a huge fan of well-done war movies and documentaries, we share history and fantasy interests, and she likes a pretty eclectic list of pop-culture movies, so the pile is pretty big. I’m working out how to store them all on the NAS such that we can play them on any TV on the network, and that got me to pondering the nature of storage access these days. We own a SAN, it never occurred to me to put these shows on it – that would limit access to those devices with an FC card… Or we’d end up creating a share to run them all through one machine with an FC card as a NAS head of sorts.
In the long litany of different ways that we store things – direct attached or networked, cloud or WAN, Object store or hierarchical – the one that stands out as the most glaring, and the one that has traditionally gotten the most attention is file versus block. For at least a decade the argument has raged between which is more suited to enterprise use, while most of us have watched from the sidelines and been somewhat bemused by the conversation because the enterprise is using both. As a rule of thumb, if you need to boot from it or write sectors of data to it, you need block. Everything else is generally file.
And that’s where I’m starting to wonder. I know there was a movement not too many years ago to make databases file based instead of block based, and that the big vendors were going in that direction, but I do wonder if maybe it’s time for block to retire at the OS level. Of course for old disks to be compatible, the OS would still have to handle block, but setting it to only allow OS-level calls (I know, it’s harder with each release, that’s death by a thousand cuts though) to read/write sectors would resolve much of the problem. Then a VMWare style boot-from-file-structure would resolve the last bit. Soon we could cut our file protocols in half. Seriously, at this point in time, what does block give us? Not much, actually. thin/auto provisioning is available on NAS, high-end performance tweaks are available on NAS, and the extensive secondary network (be it FC or IP) is not necessary for NAS, though there are some cases where throughput may demand it, those are not your everyday case in a world of 1 Gig networks with multi-Gig backplanes on most devices. And 10 Gig is available pretty readily these days.
SAN has been slowly dying, I’m just pondering the question of whether it should be finished off. Seriously, people say “SAN is the only thing for high-performance!” but I can guarantee you that I can find plenty of NAS boxes that perform better than plenty of SAN networks – just a question of vendor and connectivity. I’m a big fan of iSCSI, but am no longer sure there’s a need for it out there.
Our storage environment, as I’ve blogged before, has become horribly complex, with choices at every turn, many of which are more tied to vendor and profits than needs and customer desires. Strip away the marketing and I wonder if SAN has a use in the future of enterprise. I’m starting to think not, but I won’t declare it dead, as I am still laughing at those who declared tape dead for the last 20 years – and still are, regardless of what tape vendors’ sales look like. It would be hypocritical of me to laugh at them and make the same type of pronouncement. SAN will be dead when customers stop buying it, not before. Block will end when vendors stop supporting it, not before… So I really am just pondering the state of the market, playing devil’s advocate a bit.
I have heard people proclaim that block is much faster for database access. I have written and optimized B-Tree code, and yeah, it is. But that’s because we write databases to work on blocks. If we used a different mechanism, we’d get a different result. It is no trivial thing to move to a different storage method, but if the DB already supports file access, the work is half done, only optimizing for the new method or introducing shims to make chunks of files look like blocks would be required.
If you think about it, if your DB is running in a VM, this is already essentially the case. The VM is in a file, the DB is in that file… So though the DB might think it’s directly accessing disk blocks, it is not. Food for thought.