Instructions for Variant Database
The program vdb allows one to search the GISAID dataset of SARS-CoV-2 (hCoV-19) viruses for spike mutation patterns using a natural syntax. The two main types of objects that can be manipulated are groups of isolates (“clusters”) and groups of mutations (“patterns”). Clusters can be obtained by searching for patterns, and patterns can be obtained by examining clusters.
The default cluster to search is the collection of all isolates (“world”). To search for all isolates from the United States, enter “from US” or just “us”. A cluster or pattern can be assigned to a variable for later use. For example, this is a command that defines a variable with all viruses collected in the United States containing both spike mutations L452R and E484K:
a = us w/ L452R E484K
Below is the command to perform this search for viruses containing mutations L452R and/or E484K:
b = us w/ 1 L452R E484K
Clusters can be filtered by Pango lineage, date, number of mutations, country or US state to define new clusters. Patterns can obtained by either calculating the consensus pattern of a cluster (using the consensus command) or by listing the most frequent distinct patterns in a cluster (using the patterns command).
Below is a quick reference to all of the commands available. Full documentation is here.
Adjusting the display
The font size and number of rows in the terminal display of vdb.live can be quickly adjusted. Enter font followed by a number between 6 and 60 to change the font size. Enter rows followed by a number between 6 and 60 to change the number of rows in the terminal. Changing the font size automatically changes the number of rows.
cluster = group of viruses < > = user input n = an integer
pattern = group of mutations [ ] = optional
"world" = all viruses in database → result
If no cluster is entered, all viruses will be used (this is the built-in "world" cluster).
To define a variable for a cluster or pattern: <name> = cluster or pattern
To check whether two clusters or patterns are equal: <item1> == <item2>
To count a cluster or pattern in a variable: count <variable name>
Set operations +, -, and * (intersection) can be applied to clusters or patterns
The result, if any, of the previous command is available in the variable last.
<cluster>from<country or state> → cluster
<cluster>containing[<n>] <pattern> → cluster alias with, w/ matches for ≥ n mutations
<cluster>not containing[<n>] <pattern> → cluster alias without, w/o full pattern
<cluster>before<date> → cluster
<cluster>after<date> → cluster
<cluster>> or < or # <n> → cluster filter by # of mutations
<cluster>named<state_id or EPI_ISL#> → cluster
<cluster>lineage<Pango lineage> → cluster
Commands to find mutation patterns
consensus [for] <cluster or country or state> → pattern
patterns [in] [<n>] <cluster> → pattern lists n patterns
list [<n>] <cluster>
[list] countries [for] <cluster>
[list] states [for] <cluster>
[list] lineages [for] <cluster>
[list] trends [for] <cluster>
[list] frequencies [for] <cluster> alias freq frequency of individual mutations
[list] monthly [for] <cluster> [<cluster2>] number of isolates per month or week
[list] weekly [for] <cluster> [<cluster2>] as a fraction of the # in cluster2
[list] patterns lists built-in and user defined patterns
[list] clusters lists built-in and user defined clusters
sort <cluster> (by date)
help [<command>] alias ?
char <Pango lineage> prints characteristics of lineage
testvdb runs built-in tests of vdb
group lineages <lineage names> define a lineage group alias group lineage, lineage group
lineage groups lists defined lineages groups
clear <cluster name> or <lineage group> clears the definition
count <cluster name or pattern name>
minimumPatternsCount = <n>
trendsLineageCount = <n>