The Arteria Project
Why?
Handling sequencing data from massive parallel sequencing can be a daunting task! And while the process of handling sequencing data will share many of its characteristics across centers, the current norm is one center one solution. This creates a situation where reuse is difficult to achieve and the wheel is invented over and over again. This is a situation that we hope can be remedied in the form of the Arteria project.
What is this?
The Arteria project provides components to automate analysis and data-management tasks at a next-generation sequencing center. It leverages a micro-service based architecture together with StackStorm to create an event-driven automation system, which is both flexible and scalable.
Arteria has two main components:
StackStorm packs
StackStorm’s concept of packs means that it’s easy to redistribute automation components, and stitch these toghether into workflows. The Arteria packs can be found here: https://github.com/arteria-project/arteria-packs. This provides an excellent starting point for building your own Arteria system.
Single responsibility micro-services
These provide different functionality, such as running the Illumina bcl2fastq
program, check if a runfolder is ready to be analyzed, or remove data once certain criteria are met. The separation of responsibilities between services means that you can pick and chose which ones suite your particular workflow and infrastructure configuration.
The catalog of services of general interest currently includes:
arteria-runfolder
Manages the state of an illumina runfolder by monitoring the status files output by the instrument. It also allows for state to be changed via the REST interface.
arteria-bcl2fastq
Provides a REST interface for Illumina’s bcl2fastq
software. It includes a simple scheduler to efficiently manage multiple bcl2fastq instances on a single server.
arteria-checksum
Runs md5sum checking and lets you know whether all files have preserved their integrity.
Who can use it?
You can! We’ve open sourced the Arteria project under the MIT licence, and we hope that more organizations and individuals will join us in developing Arteria in the future.
Publications, Presentations, Blog posts, etc
We have a paper out describing Arteria which can be found here. https://academic.oup.com/gigascience/article/8/12/giz135/5673459
Roman Valls has written two excellent blog posts related to Arteria which can be found here:
- Event driven automation for DNA sequencing centers with StackStorm and Arteria at UMCCR
- Productionalising cancer reporting
We have been featured by StackStorm:
Arteria has been presented at Scientific conferences:
- Poster from AGBT 2017
- Beyond Cron and Bash - presentation at the Conference of Software Research Engineering in Manchester 2016
Who are we?
The Arteria project originated in the SNP&SEQ Technology Platform node of the National Genomics Infrastructure at SciLifeLab. It has since been adopted at the Clinical Genomics Uppsala at SciLifeLab, and the University of Melbourne Centre for Cancer Research