

the recommended reference is a genomic reference sequence based on a recent genome build, e.g.a “ :” (colon) is used as a separator between the reference sequence file identifier ( accession.version-number) and the actual description of a variant NC_000011.9 :g.12345611G>A.if a reference sequence becomes unsupported or refuted by evidence, it should no longer be used.predicted or inferred sequences should not be used for variant reporting.the translation termination codon must be clearly annotated within the reference sequence record.the first three nucleotides of the CDS must be clearly annotated within the reference sequence record.


The reference sequence database must provide a mechanism which allows simple and definitive identification of “complete” sequences only reference sequences considered to be “complete” (as defined in the bullet points below) are suitable for defining sequence variation.annotated records and downloadable formats such as fasta files the sequence identifier must be included in all representations of a reference sequence, i.e.3 is correct, NM_004006 is not correct (lacks the essential version number) In the context of these reference sequences, variant descriptions lacking a version number are not valid. RefSeq and Ensembl reference sequence identifiers use version numbers to distinguish between sequences.versioned reference sequence identifiers are required only when the reference sequence databases use versioning to distinguish between unique sequences.the structure and meaning of an identifier is determined by the source reference sequence database sequence identifiers are opaque ( note 1), i.e.a sequence identifier must only ever identify one reference sequence, and the sequence referred to by a sequence identifier may not be deleted or changed.within chromosomal reference sequences, and are not considered as undefined IUPAC codes for any nucleotide (N) or any amino acid (X) are permitted within a contiguous sequence, e.g.For example, a coding sequence will contain intron gaps when aligned to a genomic sequence Alignments between sequences may contain gaps. this requirement applies within a single sequence.reference sequence must be contiguous undefined sequence is not permissible.the sequence comprises a string of IUPAC codes that represents a nucleic acid or amino acid sequence using the conventional order (5’-to-3’ for nucleic acid sequences, and amino-to-carboxyl for amino acid sequences) reference sequences must use conventional representation, i.e.rationale: violating this requirement means that interpretation of a variant might change over time.a change in the reference sequence must trigger a change in the sequence identifier

A source that permits updating of sequence records associated with an existing sequence identifier must not be used, i.e. reference sequences must come from data sources that provide stable and permanent identifiers, e.g.coding transcript, non-coding transcript), accurately interpreting a sequence variant requires that both the reference sequence and its corresponding identifier are unchangeable. Because a reference sequence defines the numbering system and default state of a sequence (e.g. NOTE: this section has been updated based on the accepted proposal SVD-WG008 (Reference Sequences).Ī sequence variant is defined in the context of a reference sequence which must be referred to by means of a unique sequence identifier. A sequence file that is used as a reference to describe variants that are present in a sequence analysed.
