DemosTake Syndicate for a spin by using it access public datasets
|
Syndicate for Scientific Datasets
Various scientific datasets are available for access via Syndicate . Syndicate allows you to access large-scale scientific datasets from your laptop. Of course, you don't need to manually stage (or download) the datasets.
Syndicate Dataset Manager (SDM)
Syndicate Dataset Manager (SDM) helps you to mount scientific datasets to your filesystem. Once mounted, you can access the datasets immediately.
Following Video shows how easy mounting a dataset is:
Following Video shows how easy mounting a dataset is:
Following Video also shows how SDM simplifies scientific data-analysis workflow:
Available Datasets
iMicrobe: Metagenomic samples for microbial ecology.
http://imicrobe.us/ |
|
Take Syndicate Dataset Manager (SDM)
The easiest way to take Syndicate Dataset Manager (SDM) for a spin is using Docker. We provide pre-baked Docker Images that have SDM installed. Run a Docker Image and start your analysis by mounting a desired dataset in it.
Step 1. Select a Docker Image
For most users, SDM Plain image will be a good start. If you frequently need specific apps in your analysis, check out images baked with popular apps.
SDM Plain (For most users):
For most users, SDM Plain image will be a good start. If you frequently need specific apps in your analysis, check out images baked with popular apps.
SDM Plain (For most users):
- syndicatestorage/sdm (Ubuntu 14.04, Minimal)
- syndicatestorage/sdm-anaconda (Ubuntu 14.04, Anaconda 4.3.1)
- syndicatestorage/sdm-anvio (Ubuntu 14.04, Anvio 2.3.2)
- syndicatestorage/sdm-jupyter (Ubuntu 14.04, Anaconda 4.3.1, Jupyter)
- syndicatestorage/sdm-mash (Ubuntu 14.04, Mash)
Step 2. Run a Docker Image
For most images:
Type following on the console to download & run a docker image:
docker run -ti --privileged <docker-image-name>
(Note: Privileged access right, '--privileged', is required)
For images using Jupyter:
Type following on the console to download & run a docker image:
docker run -ti --privileged -p 8888:8888 <docker-image-name>
(Note: Port 8888 used by Jupyter is mapped to host port 8888)
This will run a Jupyter-Notebook locally on port 8888. Run a web-browser (i.e. Firefox or Chrome) and copy-paste a URL shown in the console to access.
For most images:
Type following on the console to download & run a docker image:
docker run -ti --privileged <docker-image-name>
(Note: Privileged access right, '--privileged', is required)
For images using Jupyter:
Type following on the console to download & run a docker image:
docker run -ti --privileged -p 8888:8888 <docker-image-name>
(Note: Port 8888 used by Jupyter is mapped to host port 8888)
This will run a Jupyter-Notebook locally on port 8888. Run a web-browser (i.e. Firefox or Chrome) and copy-paste a URL shown in the console to access.
Step 3. Mount a dataset using SDM
On the console:
Use following command to mount a dataset:
sdm mount <dataset-name>
On the Jupyter:
Use following command to mount a dataset:
!sdm mount <dataset-name>
(Note: An exclamation mark (!) is used to perform a shell command)
On the console:
Use following command to mount a dataset:
sdm mount <dataset-name>
On the Jupyter:
Use following command to mount a dataset:
!sdm mount <dataset-name>
(Note: An exclamation mark (!) is used to perform a shell command)
Other Use Cases
Mounting public datasets on your laptop is just one of many use cases Syndicate is designed to support, and extending that is set is just a matter of writing more Syndicate drivers. For example, Syndicate currently supports the following backend data stores and user-facing workflows:
- Jupyter Notebook -- Supporting data analysis & visualization.
- Hadoop -- Supporting Big-data analysis.
- iRODS -- Supporting institutional datasets.
- S3 -- Supporting data replication.