National Museum of Natural History storing the images of the great herbarium

The Museum of Natural History

Created by royal decree in 1635, what was at first the royal botanical gardens became the Museum of Natural History in 1793. Widely recognized for its expertise, its missions include the conservation and enrichment of its exceptional collections, research, education, and sharing knowledge among the public. Two thousand people work at the museum, of which 500 are researchers. They are assisted in their tasks by the IT services department under the leadership of Henri Michiels. The department manages more than 2,000 workstations and 120 physical or virtual servers located in 3 computer rooms at the botanical gardens. The systems and the storage units are connected by fiber optic cable.

The Digitalization of the Herbarium pushes the storage needs

The storage needs start to increase significantly around 2004, when the task of digitalizing the specimens in the collections begins. This generates many images with sizes varying between a few megabytes up to almost 200 MB. The museum responds to this need by implementing a SAN fiber optic network supporting the Fibre Channel protocol, which connects the three computer rooms at the botanical gardens. The SAN storage bays and the servers are connected to this network. A secure backup system protects the data.

However, the start of the renovation of the Great Herbarium in 2009 puts a strain on this architecture. The herbarium, which dates from the 17th century, is the largest in the world together with the one in London, containing more than 10 million sheets in A3 format. The renovation project includes reconditioning the sheets, integrating pending specimens, reorganizing the collections and digitalizing them. This last task is considered key for effectively sharing knowledge. The generated data volume needed to be stored is estimated at 500 terabytes!

Extending the SAN is not possible

The initial plan is to store the data on the SAN, but a cost analysis shows that this solution is not economically viable in light of budget constraints and the estimated data volumes for the herbarium. “We quickly discarded a solution based on a SAN, since the cost per gigabyte is too high and it had to be duplicated using backups, incurring additional software and storage hardware costs” affirms Henri Michiels, IT director at the Museum of Natural History.

Seeking large storage space at low cost

The storage needs are plainly stated by the IT department: the data will not be accessed very frequently, but they need to be constantly available; some lag time is acceptable. The data need to be protected without having to resort to classic weekly backups; as such systems are not suitable for static data. Still, the security must not be compromised, since any loss of the original data is unacceptable when no copies exist! The infrastructure of the system must ensure long time preservation of the data at minimal purchase and operating cost, implying low energy consumption. In short, the need is for a durable, open, secure and inexpensive storage system, something which presents an equation that is difficult to resolve.