vdb.live Documentation


The commands below can be used to search the mutational landscape of SARS-CoV-2 genomes.

Many commands have a both a verbose form (list countries for cluster1) and a short form (countries cluster1).


Notation

A cluster is a group of viruses, usually obtained as the result of a search command.

A pattern is a list of one or more mutations, user-specified or the result of a consensus or patterns command.

In the command descriptions below, items to be specified by the user are indicated with angle brackets, < >.

Optional items are indicated with square brackets, [ ].

If a command returns a cluster or pattern, this is indicated following an arrow: result

If no cluster is entered for a search command, all loaded viruses will be searched.

The set of all viruses loaded into the program is specified by the pre-defined cluster named "world".

Command keywords are not case sensitive.


Installation

Installation instructions to run vdb locally are given here.


Variables

To define a variable for a cluster or pattern: <name> = cluster or pattern

Variable names are case sensitive and can included letters or numbers.

To check whether two clusters or patterns are equal: <item1> == <item2>

To count a cluster or pattern in a variable: count <variable name>


last

The result, if any, of the previous command is available in the variable last.



Set operations

Set operations +, -, and * (intersection) can be applied to clusters or patterns.


Mutation patterns

If the loaded mutation list file contains spike protein mutations, then mutation patterns should be spike protein mutations. For example, E484K D614G.

Mutations can be separated by either a space or a comma.


Position and mutation information

If an integer by itself is entered into vdb, the residue at that reference location will be printed along with the number of occurrences and frequencies of mutations at that position.


Combining commands

The command parser of vdb is still under development, so combinations of commands will work in some cases but not others. Complex queries can nevertheless be performed with vdb: variables can be used to save the results of single commands, and these can be used as input to further search commands.


Implicit commands

In a couple situations, vdb interprets input as implying the from or lineage commands. When the first part of an expression is a country or state, this is treated as an implicit from command. When a part of an expression appears to be a Pango lineage name (containing periods), if this is not preceded by the lineage command, that command is considered implied.


Output

All valid commands should print a response. For commands that involve several steps, output will be printed for each step. If no response is printed, this indicates that there is an error in the input. Some syntax errors are explicitly noted. When printed lists are longer than the terminal display, vdb will print the list one page at a time. To advance to the next page, press the space bar. To advance one line at a time, press return or down arrow. To stop printing the list, press q.


Filtering commands


<cluster>from<country or state> cluster

Searches the specified cluster (or all viruses if no cluster is given) for viruses from the specified country or US state.



<cluster>containing[<n>] <pattern> cluster alias with, w/

Searches the specified cluster (or all viruses if no cluster is given) for viruses with the specified mutation pattern. By default only viruses with all the mutations of the specified pattern are returned. If an integer <n> is specified in the search command, then viruses are returned only if they have at least <n> of the mutations in the pattern.



<cluster>not containing[<n>] <pattern> cluster alias without, w/o (full pattern)

Searches the specified cluster (or all viruses if no cluster is given) for viruses without the specified mutation pattern. All viruses are returned except those that contain the complete mutation pattern. If an integer <n> is specified in the search command, then viruses are returned only if they have less than <n> of the mutations in the pattern.



<cluster>before<date> cluster

Searches the specified cluster (or all viruses if no cluster is given) for viruses with collection date before the specified date.



<cluster>after<date> cluster

Searches the specified cluster (or all viruses if no cluster is given) for viruses with collection date after the specified date.



<cluster>> or < or #<n> cluster filter by # of mutations

Searches the specified cluster (or all viruses if no cluster is given) for viruses with greater than (or less than, or equal to) the specified number of mutations.



<cluster>named<state_id or EPI_ISL#> cluster

Searches the specified cluster (or all viruses if no cluster is given) for viruses with the specified text string in their virus name field. Or, if a number is specified, returns the virus with that accession number.



<cluster>lineage<Pango lineage> cluster

Searches the specified cluster (or all viruses if no cluster is given) for viruses belonging to the specified Pango lineage. A program switch determines whether viruses in sublineages are returned (by default sublineages are included). Lineage names with periods are autodetected, so the keyword lineage can be omitted in combined commands.



Commands to find mutation patterns


consensus [for] <cluster or country or state> pattern

Returns the consensus mutation pattern for the specified cluster. Any mutation present in greater than 50% of the members of the cluster will be included in the consensus list.



patterns [in] [<n>] <cluster> pattern

Prints a list of the most frequent mutation patterns (indicating number of occurrences) in the specified cluster, and returns the most frequent pattern for assignment to a variable. If Pango lineage metadata has been loaded, then for each pattern, the most frequent lineage of viruses with that pattern is listed along with the percentage belonging to that lineage.


Listing commands


list [<n>] <cluster>

Lists viruses belonging to the specified cluster along with the mutation pattern of each virus. By default at most 20 members of the cluster are listed. If an integer is specified, then at most that number of members of the cluster are listed. A program switch controls whether the accession number is printed. By default the accession number is not printed.



[list] countries [for] <cluster>

Lists the countries for the viruses belonging to the specified cluster. The number of viruses for each country is printed after the country name.



[list] states [for] <cluster>

Lists the states for the viruses belonging to the specified cluster.



[list] lineages [for] <cluster>

Lists the Pango lineages of the viruses belonging to the specified cluster. The number of viruses for each lineage is printed after the lineage name. Sublineages are not included in this count.



[list] trends [for] <cluster>

For the Pango lineages with the highest counts in specified cluster, this calculates how the fractions of these lineages have changed over time. This information is given as a table and optionally as a graph. Graphs are generated by gnuplot. Sublineages are not included in these calculations unless specified by the group lineages command.



[list] frequencies [for] <cluster> alias freq

Lists the frequencies of individual mutations among the viruses belonging to the specified cluster.



[list] monthly [for] <cluster> [<cluster2>]

Lists by month the number of viruses belonging to the specified cluster with a collection date within that month. If a second cluster is specified, then the monthly numbers for that cluster are also listed along with the percentage of the first cluster count vs. the second cluster count. The first cluster should generally be a subset of the second cluster, if present.



[list] weekly [for] <cluster> [<cluster2>]

Lists by week the number of viruses belonging to the specified cluster with a collection date within that week. If a second cluster is specified, then the weekly numbers for that cluster are also listed along with the percentage of the first cluster count vs. the second cluster count. The first cluster should generally be a subset of the second cluster, if present.



[list] patterns

Lists the built-in and user-defined patterns.



[list] clusters

Lists the built-in and user-defined clusters.



[list] proteins

Lists the SARS-CoV-2 proteins and their gene positions.



Other commands


sort <cluster>

Sorts the specified cluster by sample collection date.



help [<command>] alias ?

Prints a list of vdb commands or a description of a specific command.



license

Prints the license information for vdb.



history

Lists the user-entered commands for the current vdb session.



load <vdb database file>

Loads the specified vdb database file.



char <Pango lineage> alias characteristics

Prints characteristic (consensus) mutations of the specified lineage. Mutations are shown in bold if they are not present in the parent lineage consensus pattern. This command does not include sublineages in its analysis.



testvdb

Runs built-in tests of vdb.



group lineages <lineage name(s) or named cluster> alias group lineage, lineage group

Designate which lineages should be grouped and displayed in the trends tables and graphs. If a single lineage name is given, then all sublineages will be counted as part of that lineage. If multiple lineages are listed, those will be counted under the first lineage name. If a defined cluster is given, viruses in that cluster will be counted under that cluster's name, not as part of their own lineage.



lineage groups

Lists defined lineage groups. These are used to control the tables and graphs generated by the trends command.



clear <cluster name> or <lineage group>

Clears the definition of a variable assigned to a cluster or pattern. Clears the definition used by the trends command of a lineage group created by the group lineages command.



reset

Reset program to default settings.



settings

Prints the current state of program settings.



count <cluster name or pattern name>

Prints the number of viruses in a named cluster or the number of mutations in a named pattern.



// [<comment>]

A comment line, which is ignored.



quit alias exit, control-C, control-D

Ends the current vdb session.



Program settings


debug/debug off

Controls whether debug information regarding tokenizing, parsing, and evaluating commands is printed. By default debug printing is off.



listAccession/listAccession off

Controls whether accession numbers are printed by the list command. By default printing of accession numbers is off.



listAverageMutations/listAverageMutations off

Controls whether the average number of mutations is listed for the monthly and weekly commands. By default this is off.



includeSublineages/includeSublineages off/excludeSublineages

Controls whether sublineages are included in the lineage search command. By default sublineages are included - the switch is on.



sixel/sixel off

Controls whether trends graphs are displayed on the terminal using sixel graphics.



trendGraphs/trendGraphs off

Controls whether the trends command produces graphical output. By default graphing is on.



stackGraphs/stackGraphs off

Controls whether graphs produced by the trends command are plotted as stacked graphs vs. line graphs. By default stackGraphs is on.



completions/completions off

Controls whether tab completions and hints are offered on the command line. By default completions is on.



displayTextWithColor/displayTextWithColor off

Controls whether text is printed on the terminal using colors via ANSI escape codes. If vdb output is being redirected to a file, plain text output may be preferred. By default displayTextWithColor is on.



minimumPatternsCount = <n>

Sets the minimum number of mutations for the patterns command. The default value is 0.



trendsLineageCount = <n>

Sets the number of lineages to include in tables and graphs generated by the trends command. The default value is 5.



Version 1.5