| |||||||||
Manual
NAME
Ray - assemble genomes in parallel using the message-passing interface
SYNOPSIS
mpiexec -n 80 Ray -k 31 -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
mpiexec -n 80 Ray Ray.conf # with commands in a file
mpiexec -n 80 Ray -k 31 -detect-sequence-files SampleDirectory # auto-detection
mpiexec -n 10 Ray -mini-ranks-per-rank 7 Ray.conf # with mini-ranks
DESCRIPTION:
The Ray genome assembler is built on top of the RayPlatform, a generic plugin-based
distributed and parallel compute engine that uses the message-passing interface
for passing messages.
Ray targets several applications:
- de novo genome assembly (with Ray vanilla)
- de novo meta-genome assembly (with Ray Méta)
- de novo transcriptome assembly (works, but not tested a lot)
- quantification of contig abundances
- quantification of microbiome consortia members (with Ray Communities)
- quantification of transcript expression
- taxonomy profiling of samples (with Ray Communities)
- gene ontology profiling of samples (with Ray Ontologies)
- compare DNA samples using words (Ray -run-surveyor ...; see Ray Surveyor options)
-help
Displays this help page.
-version
Displays Ray version and compilation options.
Run Ray in pure MPI mode
mpiexec -n 80 Ray ...
Run Ray with mini-ranks on 10 machines, 8 cores / machine (MPI and IEEE POSIX threads)
mpiexec -n 10 Ray -mini-ranks-per-rank 7 ...
Run Ray on one core only (still needs MPI)
Ray ...
Using a configuration file
Ray can be launched with
mpiexec -n 16 Ray Ray.conf
The configuration file can include comments (starting with #).
K-mer length
-k kmerLength
Selects the length of k-mers. The default value is 21.
It must be odd because reverse-complement vertices are stored together.
The maximum length is defined at compilation by CONFIG_MAXKMERLENGTH
Larger k-mers utilise more memory.
Inputs
-detect-sequence-files SampleDirectory
Detects files in a directory automatically.
This option can generate these commands automatically for you: LoadPairedEndReads (-p) and LoadSingleEndReads (-s)
-p leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation]
Provides two files containing paired-end reads.
averageOuterDistance and standardDeviation are automatically computed if not provided.
LoadPairedEndReads is equivalent to -p
-i interleavedSequenceFile [averageOuterDistance standardDeviation]
Provides one file containing interleaved paired-end reads.
averageOuterDistance and standardDeviation are automatically computed if not provided.
-s sequenceFile
Provides a file containing single-end reads.
LoadSingleEndReads is equivalent to -s
Outputs
-o outputDirectory
Specifies the directory for outputted files. Default is RayOutput
Other name: -output
Ray Surveyor options
-run-surveyor
Runs Ray Surveyor to compare samples.
See Documentation/Ray-Surveyor.md
This workflow generates:
RayOutput/Surveyor/SimilarityMatrix.tsv is a similarity Gramian matrix based on shared DNA words
RayOutput/Surveyor/DistanceMatrix.tsv is a distance matrix (kernel-based).
-read-sample-graph SampleName SampleGraphFile
Reads a sample graph (generated with -write-kmers)
Assembly options (defaults work well)
-disable-recycling
Disables read recycling during the assembly
reads will be set free in 3 cases:
1. the distance did not match for a pair
2. the read has not met its mate
3. the library population indicates a wrong placement
see Constrained traversal of repeats with paired sequences.
Sébastien Boisvert, Élénie Godzaridis, François Laviolette & Jacques Corbeil.
First Annual RECOMB Satellite Workshop on Massively Parallel Sequencing, March 26-27 2011, Vancouver, BC, Canada.
-debug-recycling
Debugs the recycling events
-ignore-seeds
Disables assembly by ignoring seeds.
-merge-seeds
Merges seeds initially to reduce running time.
-disable-scaffolder
Disables the scaffolder.
-minimum-seed-length minimumSeedLength
Changes the minimum seed length, default is 100 nucleotides
-minimum-contig-length minimumContigLength
Changes the minimum contig length, default is 100 nucleotides
-color-space
Runs in color-space
Needs csfasta files. Activated automatically if csfasta files are provided.
-use-maximum-seed-coverage maximumSeedCoverageDepth
Ignores any seed with a coverage depth above this threshold.
The default is 4294967295.
-use-minimum-seed-coverage minimumSeedCoverageDepth
Sets the minimum seed coverage depth.
Any path with a coverage depth lower than this will be discarded. The default is 0.
Distributed storage engine (all these values are for each MPI rank)
-bloom-filter-bits bits
Sets the number of bits for the Bloom filter
Default is auto bits (adaptive), 0 bits disables the Bloom filter.
-hash-table-buckets buckets
Sets the initial number of buckets. Must be a power of 2 !
Default value: 268435456
-hash-table-buckets-per-group buckets
Sets the number of buckets per group for sparse storage
Default value: 64, Must be between >=1 and <= 64
-hash-table-load-factor-threshold threshold
Sets the load factor threshold for real-time resizing
Default value: 0.75, must be >= 0.5 and < 1
-hash-table-verbosity
Activates verbosity for the distributed storage engine
Biological abundances
-search searchDirectory
Provides a directory containing fasta files to be searched in the de Bruijn graph.
Biological abundances will be written to RayOutput/BiologicalAbundances
See Documentation/BiologicalAbundances.txt
-one-color-per-file
Sets one color per file instead of one per sequence.
By default, each sequence in each file has a different color.
For files with large numbers of sequences, using one single color per file may be more efficient.
Taxonomic profiling with colored de Bruijn graphs
-with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv
Provides a taxonomy.
Computes and writes detailed taxonomic profiles.
See Documentation/Taxonomy.txt for details.
-gene-ontology OntologyTerms.txt Annotations.txt
Provides an ontology and annotations.
OntologyTerms.txt is fetched from http://geneontology.org
Annotations.txt is a 2-column file (EMBL_CDS handle & gene ontology identifier)
See Documentation/GeneOntology.txt
Other outputs
-enable-neighbourhoods
Computes contig neighborhoods in the de Bruijn graph
Output file: RayOutput/NeighbourhoodRelations.txt
-amos
Writes the AMOS file called RayOutput/AMOS.afg
An AMOS file contains read positions on contigs.
Can be opened with software with graphical user interface.
-write-kmers
Writes k-mer graph to RayOutput/kmers.txt
The resulting file is not utilised by Ray.
The resulting file is very large.
-graph-only
Exits after building graph.
-write-read-markers
Writes read markers to disk.
-write-seeds
Writes seed DNA sequences to RayOutput/Rank
| |||||||||
|
Ray is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License. This website is also available at sebhtml.github.io/Ray.web. |