Preprocessing

Assembly

Firstly, you will have to assemble your set of reads into contigs. For this purpose, you can use metaSPAdes and MEGAHIT as MetaCoAG currently supports assembly graphs produced from these assemblers. Support for other assemblers will be added in future.

metaSPAdes

SPAdes is an assembler based on the de Bruijn graph approach. metaSPAdes is the dedicated metagenomic assembler of SPAdes. Use metaSPAdes (SPAdes in metagenomics mode) software to assemble reads into contigs. A sample command is given below.

spades --meta -1 Reads_1.fastq -2 Reads_2.fastq -o /path/output_folder -t 8

MEGAHIT

MEGAHIT is an assembler based on the de Bruijn graph approach. Use MEGAHIT software to assemble reads into contigs. A sample command is given below.

megahit -1 Reads_1.fastq -2 Reads_2.fastq --k-min 21 --k-max 77 -o /path/output_folder -t 8

Note: Currently, MetaCoAG supports GFA file format for the assembly graph file. The MEGAHIT toolkit will produce a FASTG file which you can convert to GFA format using fastg2gfa.

fastg2gfa final.fastg > final.gfa

Support for FASTG files will be added in the near future.

Flye

Flye is a long-read assembler based on the de Bruijn graph approach. metaFlye is the metagenomic version of Flye. Use metaFlye to assemble reads into contigs. A sample command is given below.

flye --meta --pacbio-raw Reads.fastq --out-dir /path/output_folder --threads 8

How to get the abundance.tsv file

You can use CoverM to get the coverage of contigs. You can run the following commands to get the abundance.tsv file.

coverm contig -1 reads_1.fastq -2 reads_2.fastq -r contigs.fasta -o abundance.tsv -t 8
sed -i '1d' abundance.tsv   # remove the header of the file

You can use the -c (or --coupled) option of CoverM if you have multiple samples. Please refer the CoverM contig documentation for further details.

The resulting abundance.tsv file can be directly used in MetaCoAG.

Once you have obtained the assembly output and the abundance.tsv file, you can run MetaCoAG.