Options
All
  • Public
  • Public/Protected
  • All
Menu

gtdb-local

gtdb-local

npm License: CC0-1.0 pipeline status coverage report

Javascript implementation of a pouchDB to host the data of GTDB locally. Written in Typescript.

GTDB Version: 95

Please cite the original authors:
Parks, D. H., et al. (2018). "A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life." Nature Biotechnology.

Please, read this first

Upon installing this package, it will build GTDB using PouchDB in a hidden folder .gtdb-local on your home directory. It can take almost 400 MB of space in the hard drive.

If you want to install the package but not set up the database, you need to pass a variable to skip postinstall setup. Take a looks at the install session

Also, this is a very early version. Please use at your own risk.

Install

We can install the module like this:

npm install gtdb-local

This will install the module and set up the database files in the local home directory.

If you want to install just the package:

skip_setup='yes' npm install gtdb-local

After installing like this, there will be no data. We must download the data from the GTDB website and then build the DB and index for faster search.

Usage (assuming install and setup)

Select Protobacterial genomes

import { Gtdb } from 'gtdb-local'

const gtdb = new Gtdb()
gtdb.connectDB()
    .then((db) => {
        return db.find({
            selector: {
                p: 'Proteobacteria'
            }
        })
    })
    .then((data) => {
        // do something with Proteobacteria genomes
    })

Search taxonomy info for genomes in bulk

We added the GTDB data on PouchDB using genome NCBI (new as of 2019) accession code as the main index.

import { Gtdb } from 'gtdb-local'

const genomeIds = [
    'UBA10210',
    'RS_GCF_002214165.1',
    'UBA10214',
    'GB_GCA_001871475.1',
    'GB_GCA_001871595.1',
    'UBA8261',
    'GB_GCA_001871495.1',
    'GB_GCA_001871535.1',
    'GB_GCA_001889985.1',
    'GB_GCA_002763345.1'
]

const gtdb = new Gtdb()
gtdb.connectDB().then((db) => {
    const searchOptions = {
        include_docs: true,
        keys: genomeIds
    }
    db.allDocs(searchOptions).then((results: any) => {
        // do something with the results.
    })

Balanced selection of 100 random genomes from Protobacteria

Certain clades of organisms have been sequenced more than others. For this reason, we implemented a balanced sample of genomes under a GTDB node.

Note that we don't need to connect to the DB itself because the algorithm uses the newick tree and not the database.

This is just a wrapper around the Phylogician-TS selectBalancedLeafs()

import { Gtdb } from 'gtdb-local'

const gtdb = new Gtdb()
const data = gtdb.selectBalancedSample('p', 'Proteobacteria', [], 100)
// do something with the 100 genomes randomly selected from Proteobacteria clade

Uninstall

We can uninstall gtdb-local by using npm

npm uninstall gtdb-local

To get rid of the main database files in our home directory, we can just remove ~/.gtdb-local.

Be careful when removing directories.

Documentation

Developer's Documentation

Todo

  • Version control of gtdb data

Written with ❤ in Typescript.

Generated using TypeDoc