Read gene annotations from gtf format into a data frame. The source can be a URL, a gtf file on disk, or a gencode release version.
Usage
read_gtf(
path,
attributes = c("gene_id"),
tags = character(0),
features = c("gene"),
keep_attribute_column = FALSE,
backup_url = NULL,
timeout = 300
)
read_gencode_genes(
dir,
release = "latest",
annotation_set = c("basic", "comprehensive"),
gene_type = "lncRNA|protein_coding|IG_.*_gene|TR_.*_gene",
attributes = c("gene_id", "gene_type", "gene_name"),
tags = character(0),
features = c("gene"),
timeout = 300
)
read_gencode_transcripts(
dir,
release = "latest",
transcript_choice = c("MANE_Select", "Ensembl_Canonical", "all"),
annotation_set = c("basic", "comprehensive"),
gene_type = "lncRNA|protein_coding|IG_.*_gene|TR_.*_gene",
attributes = c("gene_id", "gene_type", "gene_name", "transcript_id"),
features = c("transcript", "exon"),
timeout = 300
)
Arguments
- path
Path to file (or desired save location if backup_url is used)
- attributes
Vector of GTF attribute names to parse out as columns
Vector of tags to parse out as boolean presence/absence
- features
List of features types to keep from the GTF (e.g. gene, transcript, exon, intron)
- keep_attribute_column
Boolean for whether to preserve the raw attribute text column
- backup_url
If path does not exist, provides a URL to download the gtf from
- timeout
Maximum time in seconds to wait for download from backup_url
- dir
Output directory to cache the downloaded gtf file
- release
release version (prefix with M for mouse versions). For most recent version, use "latest" or "latest_mouse"
- annotation_set
Either "basic" or "comprehensive" annotation sets (see details section).
- gene_type
Regular expression with which gene types to keep. Defaults to protein_coding, lncRNA, and IG/TR genes
- transcript_choice
Method for selecting representative transcripts. Choices are:
MANE_Select: human-only, most conservative
Ensembl_Canonical: human+mouse, superset of MANE_Select for human
all: Preserve all transcript models (not recommended for plotting)
Value
Data frame with coordinates using the 0-based convention. Columns are:
chr
source
feature
start
end
score
strand
frame
attributes (optional; named according to listed attributes)
tags (named according to listed tags)
Details
read_gtf
Read gtf from a file or URL
read_gencode_genes
Read gene annotations directly from GENCODE. The file name will vary depending
on the release and annotation set requested, but will be of the format
gencode.v42.annotation.gtf.gz
. GENCODE currently recommends the basic set:
https://www.gencodegenes.org/human/. In release 42, both the comprehensive and
basic sets had identical gene-level annotations, but the comprehensive set had
additional transcript variants annotated.
read_gencode_transcripts
Read transcript models from GENCODE, for use with trackplot_gene()