homehome Home chatchat Notifications


IBM is building the largest data array in the world - 120 petabytes of storage

IBM recently made public its intentions of developing what will be upon its completion the world’s largest data array, consisting of 200,000 conventional hard disk drives intertwined and working together, adding to 120 petabytes of available storage space. The contract for this massive data array, 10 times bigger than any other data center in the […]

Tibi Puiu
August 29, 2011 @ 11:10 am

share Share

Data Center

IBM recently made public its intentions of developing what will be upon its completion the world’s largest data array, consisting of 200,000 conventional hard disk drives intertwined and working together, adding to 120 petabytes of available storage space. The contract for this massive data array, 10 times bigger than any other data center in the world at present date, has been ordered by an “unnamed client”, whose intentions has yet to be disclaimed. IBM claims that the huge storage space will be used for complex computations, like those used to model weather and climate.

To put things into perspective 120 petabytes, or 120 million gygabites would account for 24 billion typical five-megabyte MP3 files or 60 downloads of the entire internet, which currently spans across 150 billion web pages. And while 120 petabytes might sound outrageous by any sane standard today, in just a short time, at the rate technology is advancing, it might become fairly common to encounter a data center similarly sized in the future.

“This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it,” Hillsberg says. Just keeping track of the names, types, and other attributes of the files stored in the system will consume around two petabytes of its capacity.

I know some of you tech enthusiasts out there are already grinding your teeth a bit to this fairly dubious numbers. I know I have – 120 petabytes/200.000 equals 600 GB. Does this mean IBM is using only 600 GB hard drives? I’m willing to bet they’re not that cheap, it’s would be extremely counter-productive in the first place. Firstly, it’s worth pointing out that we’re not talking about your usual commercial hard drives. Most likely, the hard-drives used will be of the sort of 15K RPM Fibre Channel disks, at the very least – which beats the heck out of your SATA drive currently powering your computer storage. These kind of hard-drives are currently not that voluminous in storage as SATA ones, so this might be an explanation. There’s also the issue of redundancy which is encountered in data centers, which decreases the amount of available real storage spaces and increases as a data center is larger. So the hard-drives used could actually be somewhere between 1.5 and 3 TB, all running on cutting edge data transfer speed.

Steve Conway, a vice president of research with the analyst firm IDC who specializes in high-performance computing (HPC), says IBM’s repository is significantly bigger than previous storage systems. “A 120-petabye storage array would easily be the largest I’ve encountered,” he says.

To house these massively numbered hard-drives IBM located them horizontaly on drawers, like in any other data center, but made these spaces even wider, in order to accommodate more disks within smaller confines. Engineers also implemented a new data backup mechanism, whereby information from dying disks is slowly reproduced on a replacement drive, allowing the system to continue running without any slowdown. Also, a system called GPFS, meanwhile, spreads stored files over multiple disks, allowing the machine to read or write different parts of a given file at once, while indexing its entire collection at breakneck speeds.

Last month a team from IBM used GPFS to index 10 billion files in 43 minutes, effortlessly breaking the previous record of one billion files scanned in three hours. Now, that’s something!

Fast access to huge storage is of crucial necessity for supercomputers, who need humongous amounts of bytes to compute the various complicate model they’re assigned to, be it weather simulations or the decoding of the human genome. Of course, they can be used, and most likely are already in place, to store identities and human biometric data too. I’ll take this opportunity to remind you of a frightful fact we published a while ago – every six hours the NSA collects data the size of the Library of Congress.

As quantum computing takes ground and eventually the first quantum computer will be developed, these kind of data centers will become highly more common.

UPDATE: The facility has indeed opened in 2012. 

MIT Technology Review

share Share

AI thought X-rays are connected to eating refried beans or drinking beer

Instead of finding true medical insights, these algorithms sometimes rely on irrelevant factors — leading to misleading results.

AI is scheming to stay online — and then lying to humans

An alarming third party report almost looks like a prequel to Terminator.

The David Mayer case: ChatGPT refuses to say some names. We have an idea why

Who are David Mayer and Brian Hood?

Futuristic Contact Lens Delivers Medication Directly to Your Eye

The next time you take some medicine, it could be through your lens.

How CCTV Cameras and AI Can Prevent Floods in Cities

Researchers have developed an AI system using CCTV cameras to monitor culverts, potentially reducing urban flooding by detecting blockages in real-time.

Elon Musk’s social media posts have had a ‘sudden boost’ since July, new research reveals

Is the former Twitter platform now just used as a megaphone?

Researchers build ChatGPT-powered robot arm that costs $120

ChatGPT is leaking into the physical world.

The world's first wooden satellite was launched into space

The satellite is made from magnolia wood, which was historically used for samurai sheaths.

Fast fashion company replaces models with AI and brags about it

The clothes they are "wearing" are real. But everything else is very, very fake.

This smart sensor can detect health symptoms without cloud computing

Sensor patches could transform healthcare and health monitoring.