101communication LLC CertCities.com -- The Ultimate Site for Certified IT Professionals
   Certification Communities:  Home  Microsoft®  Cisco®  Oracle®  A+/Network+"  Linux/Unix  More  
    CertCities.com is proud to present our sponsor this month: Thomson Prometric
Editorial
Choose a Cert
News
Exam Reviews
Features
Columns
Salary Surveys
Free Newsletter
Cert on the Cheap
Resources
Forums
Practice Exams
Cert Basics
Links Library
Tips
Pop Quiz
Industry Releases
Windows Certs
Job Search
Conferences
Contributors
About Us
Search


Advanced Search
CertCities.com

CertCities.com
Let us know what you
think! E-mail us at:
.. Home .. Certifications .. Linux Unix .. Columns ..Column Story Monday, August 04, 2003

 Notes from Underground   James Ervin
James Ervin



 Storage Consolidation and Virtualization: Part 1
First in a series of columns where James looks at storage issues for Linux/Unix admins.
by James Ervin  
7/10/2002 -- In 1965, Intel's future founder Gordon Moore observed that the number of components in the average integrated circuit was doubling every year in relation to cost. Though it originally described a specific trend in chip manufacturing, "Moore's Law" proved remarkably predictive of advances in other high-tech industries. However, as Moore was aware, exponential growth is unlimited only in the mathematical sense. Many industries fail to fulfill his prophecy.

For instance, memory access times are increasing more slowly than processor speeds, resulting in a processor-memory gap. Today's processors typically spend more time waiting for data to be retrieved than processing it. Consequently, modern computers are stuffed with software and hardware caches, so that frequently used data need not be retrieved from memory repeatedly. Mismatched component performance of this sort limits a system's capabilities, but clever design can usually ameliorate the deficiencies.

Network bandwidth and storage bandwidth are also diverging, though many people seem to think they're in competition. The recent ratification of the 10 Gigabit Ethernet standard seems to confirm what vendors such as NetApp have been saying at conferences: network bandwidth will eventually outstrip storage bandwidth, although the 2 gigabit fibre channel standard is currently beating Gigabit Ethernet. This is not lost on storage vendors, who are readying products based on a new set of networked storage protocols. "Internet SCSI" or iSCSI and the many competing fibre channel networking variants (mFCP, iFCP, FC-BB, FCIP) are among the contenders.

To make sense of this mess, we need to divide these concepts of "network bandwidth" and "storage bandwidth" into their component parts. In this column we'll look at some of the new storage architectures; in the next we'll go into the details of the competing protocols in more depth, and see whether Linux/UNIX-compatible products will actually be available anytime soon.

Inside the Server
Neither network nor storage protocols have anything to do with a hard disks' throughput. A moment's thought should make it clear that hard disk throughput -- a product of storage density, platter speed, and other factors -- is the most significant bottleneck in any storage architecture, though it's well hidden. If disks were faster than everything else, we wouldn't need to use RAID arrays to increase throughput. This is the monkey on the back of all physical media—speed increases the farther away you get from the moving parts. High-performance computers vastly prefer to shift electrons around instead of clunky drive heads.

Some doomsayers are also predicting that increases in storage capacity will taper off within the decade, because the limits of magnetic media are in sight, if still distant. Advanced storage research now involves nanometer scale punch cards and more esoteric technologies. If the storage density of magnetic hard-drive technology does indeed taper off before alternatives are available, that will pose a significant blow to future increases in storage speed, but not an insurmountable one, since the difficulties of physical storage are well understood: the more spindles you can devote to your data, the better.

The internal bus, or channel on which the data travels inside the server, is a second potential bottleneck. The most significant development here is the Infiniband standard, which is poised to become the successor to PCI. Infiniband's maximum speed of 2.5 GB/sec, compared to 533 MB/sec for the 64-bit PCI standard, should eliminate most I/O bottlenecks inside the server, with the exception of those imposed by the physical disks themselves. Infiniband is much more than simply a replacement for PCI, but that's a topic for the next article.

Eliminating the Middleman: DAS, NAS and SAN
The familiar SCSI protocol for attaching storage devices to servers has been superseded by fibre channel. However, the cost of entry for fibre channel remains high. Also, most fibre channel implementations still assume that there is a server through which the data must pass before it reaches the client. This is the direct-attached storage model, finally endowed with an acronym of its own: DAs

DAs remains useful, intuitive, and expensive to leave behind; but storage is increasingly indistinguishable from the network—another self-fulfilling prophecy. As customers demand faster, more accessible storage, vendors gravitate towards faster, more flexible technologies to cope. Data transmission rates are growing the fastest in the networking industry. Speed aside, putting storage directly on the network without an intermediary server also has numerous other advantages. For instance, detaching storage from an application server gives that server a substantial performance boost; it no longer has to deal with storage devices and drivers. Two technologies exploit these possibilities: Network Attached Storage (NAS) and Storage Area Networks (SAN)

NAS devices consolidate multiple file system protocols into one hardware device. NetApp's Filer product line, for example, provides access to files via the Common Internet File System (CIFS), Network File System (NFS), or HyperText Transfer Protocol (HTTP). Interestingly, CIFS and NFS are network file system protocols, while HTTP is the World Wide Web protocol. These are very different protocols, but can all be used to locate and retrieve files. Additionally, CIFS is stateful—it has a "memory," while NFS and HTTP are stateless (every request is treated as if it came from a new client). These seem like odd birds to nest together, but end users don't distinguish between file system protocols no matter how obvious the difference. Form, in this case, follows perceived function. While generally reviled in Linux camps, Microsoft's Internet Information Server was prescient in this regard: it enabled simple, one-step access to the same data via multiple protocols (HTTP, FTP, Gopher). On UNIX servers, software such as Samba, Netatalk, and NFS can be used to serve identical data to Windows, Macintosh, and UNIX clients. Products such as NetApp's Filer simply perform these functions in hardware.

NAS devices are not to be confused with SANs, or Storage Area Networks. Whereas NAS devices work with files, Sans provide block-level access to data. A file on a SAN is accessed as if it were on a locally-attached drive; the SAN pays no attention to the files or their content. Unfortunately, most Sans are single-vendor solutions. Nor have Sans proved particularly useful in making data directly available to end users. Instead, a SAN is most useful when a common storehouse of data must be shared between a cluster of servers, all of which perform a specific function. Common SAN applications are redundant web servers that serve identical content, or computing clusters that need access to the same data. Fibre channel, the successor to SCSI, is the standard of choice for SAN installations, but has the aforementioned high entry cost and is limited by distance. As a result, SAN implementations typically reside on private networks, and servers still act as middlemen. NAS devices are cheaper, simpler to deploy, and reach as far as the Ethernet network, but perform poorly in high-performance applications because of the overhead involved in manipulating files instead of raw chunks of data.

The consolidation of network file system protocols by NAS devices is too vendor-specific to provide a wholly acceptable solution. There's no clear front-runner between the file system protocols: NFS in the UNIX world and CIFS in the Windows world are both here to stay. The addition of HTTP and FTP only confuses the issue. As the success of TCP/IP should lead us to infer, successful protocols are platform-agnostic. Next generation storage protocols accomplish this by leveraging the IP networks already in place, thereby extending the block-level benefits of Sans to the range of NAS devices.

Next Time
Conglomeration of networking protocols with data protocols appears to be the storage solution everyone's been waiting for, but protocol wars seem to be brewing. Since IP is the clear winner in the networking race, the only contentious choice is which storage protocol to use. The next few years have been framed as a battle between iSCSI and the victor of the three competing fibre channel over IP standards, but each is superior in certain respects. In the next column, we'll pursue each of the protocols in more depth and hazard one or two guesses about the eventual victors. In the meantime, check out the following links for more information on the topics discussed above:

  • Gordon Moore's original paper
  • Moore's Law defined
  • "Itsy-Bitsy: Hard-Drives Bumping Up Against Physical Limits"
  • Limits of Magnetic Recording
  • Network and Storage Bandwidth -- A somewhat obscure chart showing the decreasing gap between network bandwidth and storage bandwidth.
  • The era of giant magnetoresistive heads
  • "Millipede" project demonstrates trillion-bit data storage density
  • Factors in Network Speed -- Future storage may not use magnetism at all…

James Ervin is alone among his coworkers in enjoying Michelangelo Antonioni films, but in his more lucid moments suspects that they're not entirely wrong.

 

More articles by James Ervin:

Post your comment below, or better yet, go to our Discussion Forums and really post your mind.
Current CertCities.com user Comments for "Storage Consolidation and Virtualization: Part 1"
No postings yet.
Add your comment here:
Name: (optional)
Location: (optional)
E-mail Address: (optional)
Comments:  
 
top

Sponsored Link:
Don’t let your IT Investment Go to Waste: Get Certified with Thomson Prometric!

Home | Microsoft | Cisco | Oracle | A+/Network+ | Linux/Unix | MOUS | List of Certs
Advertise | Certification Basics | Conferences | Contact Us | Contributors | Features | Forums | Links | News | Pop Quiz | Industry Releases | Reviews | Tips
Search | Site Map | MCPmag.com | TCPmag.com | OfficeCert.com | TechMentor Conferences | 101communications | Privacy Policy
This Web site is not sponsored by, endorsed by or affiliated with Cisco Systems, Inc., Microsoft Corp., Oracle Corp., The Computing Technology Industry Association, Linus Torvolds, or any other certification or technology vendor. Cisco® and Cisco Systems® are registered trademarks of Cisco Systems, Inc. Microsoft, Windows and Windows NT are either registered trademarks or trademarks of Microsoft Corp. Oracle® is a registered trademark of Oracle Corp. A+®, i-Net+™, Network+™, and Server+™ are trademarks and registered trademarks of The Computing Technology Industry Association. (CompTIA). Linux™ is a registered trademark of Linus Torvalds. All other trademarks belong to their respective owners.
All content copyright 2000-03 101communications LLC, unless otherwise noted. All rights reserved.
Reprints allowed with written permission from the publisher. For more information, e-mail