PGCGAP - the Prokaryotic Genomics and Comparative Genomics Analysis Pipeline


Platform License GitHubversion Downloads conda install with bioconda

    

English Readme | Chinese Readme

      ____       ____      ____     ____       _        ____    
    U|  _"\ u U /"___|u U /"___| U /"___|u U  /"\  u  U|  _"\ u 
    \| |_) |/ \| |  _ / \| | u   \| |  _ /  \/ _ \/   \| |_) |/ 
     |  __/    | |_| |   | |/__   | |_| |   / ___ \    |  __/   
     |_|        \____|    \____|   \____|  /_/   \_\   |_|      
     ||>>_      _)(|_    _// \\    _)(|_    \\    >>   ||>>_    
    (__)__)    (__)__)  (__)(__)  (__)__)  (__)  (__) (__)__)   

Introduction

PGCGAP is a pipeline for prokaryotic comparative genomics analysis. It can take the pair-end reads, Oxford reads or PacBio reads as input. In addition to genome assembly, gene prediction and annotation, it can also get common comparative genomics analysis results such as phylogenetic trees of single-core proteins and core SNPs, pan-genome, whole-genome Average Nucleotide Identity (ANI), orthogroups and orthologs, COG annotations, substitutions (SNPs) and insertions/deletions (indels), and antimicrobial and virulence genes mining with only one line of commands.

Installation

The software was tested successfully on Windows WSL, Linux x64 platform, and macOS. Because this software relies on a large number of other software, so it is recommended to install with Bioconda.

Step1: Install PGCGAP

$conda create -n pgcgap python=3
$conda activate pgcgap
$conda install pgcgap (Users in China can input "conda install -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda pgcgap" for instead)

Step2: Setup COG database (Users should execute this after the first installation of pgcgap)

$conda activate pgcgap
$pgcgap --setup-COGdb
$conda deactivate

Users with docker container installed have another choice to install PGCGAP.

$docker pull quay.io/biocontainers/pgcgap:

(see pgcgap/tags for valid values for <tag>)

Required dependencies

Usage





Examples

Generating Input files

Working directory

The directory where the PGCGAP software runs.

Assemble

Pair-end reads of all strains in a directory or PacBio reads or Oxford nanopore reads (Default: ./Reads/Illumina/ under the working directory).

Annotate

Genomes files (complete or draft) in a directory (Default: Results/Assembles/Scaf/Illumina under the working directory).

ANI

QUERY_LIST and REFERENCE_LIST files containing full paths to genomes, one per line (default: scaf.list under the working directory). If the “--Assemble” function was run first, the list file will be generated automatically.

MASH

Genomes files (complete or draft) in a directory (Default: Results/Assembles/Scaf/Illumina under the working directory).

CoreTree

Amino acids file (With “.faa” as the suffix) and nucleotide (With “.ffn” as the suffix) file of each strain placed into two directories (default: “./Results/Annotations/AAs/” and “./Results/Annotations/CDs/”). The “.faa” and “.ffn” files of the same strain should have the same prefix name. The name of protein IDs and gene IDs should be started with the strain name. The “Prokka” software was suggested to generate the input files. If the “--Annotate” function was run first, the files will be generated automatically. If the “--CDsPath” was set to “NO”, the nucleotide files will not be needed.

OrthoF

A set of protein sequence files (one per species) in FASTA format under a directory (default: “./Results/Annotations/AAs/”). If the “--Annotate” function was run first, the files will be generated automatically.

Pan

GFF3 files (With “.gff” as the suffix) of each strain placed into a directory. They must contain the nucleotide sequence at the end of the file. All GFF3 files created by Prokka are valid (default: ./Results/Annotations/GFF/). protein sequence files (one per species) in FASTA format under another directory were also needed (default: “./Results/Annotations/AAs/”). If the “--Annotate” function was run first, the files will be generated automatically.

pCOG

Amino acids file (With “.faa” as the suffix) of each strain placed into a directory (default: ./Results/Annotations/AAs/). If the “--Annotate” function was run first, the files will be generated automatically.

VAR

AntiRes

Genomes files (complete or draft) in a directory (Default: Results/Assembles/Scaf/Illumina under the working directory).

STREE

Multiple-FASTA sequences in a file, can be Protein, DNA and Codons.

Output Files

Assemble

Annotate

ANI

MASH

CoreTree

OrthoF

Pan

pCOG

VAR

AntiRes

STREE

License

PGCGAP is free software, licensed under GPLv3.

Feedback and Issues

Please report any issues to the issues page or email us at liaochenlanruo@webmail.hzau.edu.cn.

Citation

FAQ

Q1 VAR function ran failed to get annotated VCFs and Core results

Check the log file named in "strain_name.log" under Results/Variants/<strain_name>/ directory. If you find a sentence like "WARNING: All frames are zero! This seems rather odd, please check that 'frame' information in your 'genes' file is accurate." This is a snpEff error. Users can install JDK8 to solve this problem.

$conda install java-jdk=8.0.112

Click here for more solutions.

Q2 Could not determine version of minced please install version 2 or higher

When running the Annotate function, this error could happen, the error message shows as following:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: minced has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
[01:09:40] Could not determine version of minced - please install version 2.0 or higher

Users can downgrade the minced to version 0.3 to solve this problem.

$conda install minced=0.3

Click here for detail informations.

Q3 dyld: Library not loaded: @rpath/libcrypto.1.0.0.dylib

This error may happen when running function "VAR" on macOS. It is an error of openssl. Users can solve this problem as the following:

#Firstly, install brew if have not installed before
$ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

#Install openssl with brew
$brew install openssl

#Create the soft link for libraries
$ln -s /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib /usr/local/lib/

$ln -s /usr/local/opt/openssl/lib/libssl.1.0.0.dylib /usr/local/lib/

Click here for more informations

Q4 Use of uninitialized value in require at Encode.pm line 61

This warning may happen when running function "Pan". It is a warning of Roary software. The content of line 61 is "require Encode::ConfigLocal;". Users can ignore the warning. Click here for details.

Updates


Total visits: times Visitors: people