Read gene annotations from gtf format into a data frame. The source can be a URL, a gtf file on disk, or a gencode release version.
Usage
read_gtf(
path,
attributes = c("gene_id"),
tags = character(0),
features = c("gene"),
keep_attribute_column = FALSE,
backup_url = NULL,
timeout = 300
)
read_gencode_genes(
dir,
release = "latest",
annotation_set = c("basic", "comprehensive"),
gene_type = "lncRNA|protein_coding|IG_.*_gene|TR_.*_gene",
attributes = c("gene_id", "gene_type", "gene_name"),
tags = character(0),
features = c("gene"),
timeout = 300
)
read_gencode_transcripts(
dir,
release = "latest",
transcript_choice = c("MANE_Select", "Ensembl_Canonical", "all"),
annotation_set = c("basic", "comprehensive"),
gene_type = "lncRNA|protein_coding|IG_.*_gene|TR_.*_gene",
attributes = c("gene_id", "gene_type", "gene_name", "transcript_id"),
features = c("transcript", "exon"),
timeout = 300
)Arguments
- path
Path to file (or desired save location if backup_url is used)
- attributes
Vector of GTF attribute names to parse out as columns
Vector of tags to parse out as boolean presence/absence
- features
List of features types to keep from the GTF (e.g. gene, transcript, exon, intron)
- keep_attribute_column
Boolean for whether to preserve the raw attribute text column
- backup_url
If path does not exist, provides a URL to download the gtf from
- timeout
Maximum time in seconds to wait for download from backup_url
- dir
Output directory to cache the downloaded gtf file
- release
release version (prefix with M for mouse versions). For most recent version, use "latest" or "latest_mouse"
- annotation_set
Either "basic" or "comprehensive" annotation sets (see details section).
- gene_type
Regular expression with which gene types to keep. Defaults to protein_coding, lncRNA, and IG/TR genes
- transcript_choice
Method for selecting representative transcripts. Choices are:
MANE_Select: human-only, most conservative
Ensembl_Canonical: human+mouse, superset of MANE_Select for human
all: Preserve all transcript models (not recommended for plotting)
Value
Data frame with coordinates using the 0-based convention. Columns are:
chr
source
feature
start
end
score
strand
frame
attributes (optional; named according to listed attributes)
tags (named according to listed tags)
Details
read_gtf
Read gtf from a file or URL
read_gencode_genes
Read gene annotations directly from GENCODE. The file name will vary depending
on the release and annotation set requested, but will be of the format
gencode.v42.annotation.gtf.gz. GENCODE currently recommends the basic set:
https://www.gencodegenes.org/human/. In release 42, both the comprehensive and
basic sets had identical gene-level annotations, but the comprehensive set had
additional transcript variants annotated.
read_gencode_transcripts
Read transcript models from GENCODE, for use with trackplot_gene()
