Storage overview

The ecocloud Platform has different types of data storage, each with different properties that you need to be aware of.

When you run a server, typically you will be using a tool to process input data to produce result data. This article describes the different options available for obtaining and storing the input data, and for storing the result data.

Types of storage

Storage can be divided into two categories:

  • Internal storage inside the ecocloud Platform; and
  • External storage outside the ecocloud Platform.

Internal storage

Internal storage is directly mounted by servers, so performance is very good and it is easily accessed by scripts and notebooks.

There are two types of internal storage:

  • Your ecocloud workspace storage. This is where notebooks and scripts should be stored. Files and directories in your workspace is kept after the server is terminated.
  • The server's scratch storage. This is where the working data should be stored. The input data should read from this type of storage and results can be written to this type of storage.

Currently, you are allocated 10 GB of workspace storage (where 1 GB is 10^12 bytes).

External storage

External storage is accessed over the Internet, so it is harder for scripts and notebooks to access, and performance is poorer. Reliability also depends on the performance and availability of the network and the remote storage service.

Some of the supported types of external storage are:

  • Files on your local computer (i.e. the computer running the Web browser used to access the ecocloud Platform);
  • Dropbox;
  • Public datasets found using ecocloud Explorer.

External storage is provided by third parties and is not provided by ecocloud Platform.

Using storage

Trade offs

The different types of storage have different trade offs, which need to be considered when choosing where to store your data:

  • Workspace storage is permanent, but has limited capacity: so it is not suitable for storing large data files.
  • Server scratch storage has a larger capacity, but it is not permanent.
  • Internal storage has better performance and is easily accessed, it has limited capacity (and scratch storage is not persistent).

Important: any data stored in the server's scratch storage will be lost when the server is terminated, and servers may be automatically terminated. Remember to copy off any data you want to keep.

Recommended workflow

The recommended workflow is to:

  1. Launch a server.
  2. Copy the input data from external storage to the server's scratch storage.
  3. Save your notebooks and scripts to your workspace storage.
  4. Perform computations and write the results to the server's scratch storage.
  5. Copy any results you want to keep from the server's scratch storage to external storage.
  6. Terminate the server.

This is because the tools running in the server can access the server’s scratch storage much more quickly and reliably than going over the network to access external storage. Alternatively, you could write code to directly access external storage, if that is more suitable for the type of processing you want to perform.

Copying files

The method for copying data to/from external storage depends on the type of ecocloud server and the type of external storage being used. Please see these support articles for the details.

When using a Jupyter server (e.g. for running Python or R Notebooks, or RStudio):

When a Virtual Desktop server is running:

  • Local computer storage and Virtual Desktops (coming soon);
  • Dropbox and Virtual Desktops (coming soon).

Modified on: 2018-10-08 14:57:15 +1100

Did you find it helpful? Yes No

Can you please tell us how we can improve this article?