Resources & References
Databases, communities, and ongoing learning resources
Overview
This page compiles essential databases, online communities, and additional resources to support your bioinformatics journey. Bookmark this page for quick reference!
Genomics Databases
Reference Genomes & Annotations
Ensembl
What it is: Genome browser for vertebrate genomes
Use for: Browsing genes, transcripts, variants, regulatory elements
Link: ensembl.org
Features: Genome browser, BioMart data mining, REST API
UCSC Genome Browser
What it is: Web-based genome browser
Use for: Visualizing genomic data, downloading reference files
Link: genome.ucsc.edu
Features: Custom tracks, table browser, liftOver tool
NCBI RefSeq
What it is: Curated collection of genomic, transcript, and protein sequences
Use for: Reference sequences, gene information
Link: ncbi.nlm.nih.gov/refseq
Variant Databases
dbSNP
What it is: Database of single nucleotide polymorphisms and short variants
Use for: Variant IDs (rs numbers), allele frequencies
Link: ncbi.nlm.nih.gov/snp
gnomAD (Genome Aggregation Database)
What it is: Population allele frequency database (>140,000 genomes)
Use for: Determining if variants are common or rare, filtering
Link: gnomad.broadinstitute.org
Key feature: Allele frequencies across diverse populations
ClinVar
What it is: Public archive of relationships between variants and phenotypes
Use for: Clinical significance of variants, pathogenicity
Link: ncbi.nlm.nih.gov/clinvar
COSMIC (Catalogue of Somatic Mutations in Cancer)
What it is: Database of somatic mutations in cancer
Use for: Cancer genomics, somatic variant annotation
Link: cancer.sanger.ac.uk/cosmic
GWAS & Association Studies
GWAS Catalog
What it is: Curated collection of published GWAS results
Use for: Finding known genetic associations, exploring traits
Link: ebi.ac.uk/gwas
Key feature: Searchable by trait, gene, variant
PGS Catalog (Polygenic Score Catalog)
What it is: Open database of published polygenic scores
Use for: Calculating polygenic risk scores, risk prediction
Link: pgscatalog.org
Includes: Calculator tool, score downloads
OpenTargets
What it is: Platform integrating genetics and genomics for target identification
Use for: Connecting genetic evidence to drug targets
Link: platform.opentargets.org
Dementia-Specific Resources
AlzGene
What it is: Field synopsis of genetic association studies in Alzheimer’s disease
Use for: Known AD risk loci, meta-analysis results
Link: alzgene.org
Note: Last updated 2014, but comprehensive for established loci
Alzforum
What it is: Networking site for Alzheimer’s research
Use for: News, mutations database, research tools
Link: alzforum.org
Features: Mutations database, biomarkers, protocols
NIAGADS (National Institute on Aging Genetics of Alzheimer’s Disease Data Storage)
What it is: Repository for AD genetics data
Use for: Accessing AD genomics datasets
Link: niagads.org
Access: Requires application/approval
Expression & Functional Databases
GTEx (Genotype-Tissue Expression)
What it is: Resource on human gene expression and regulation across tissues
Use for: Tissue-specific expression, eQTLs
Link: gtexportal.org
Includes: Brain-specific data (multiple regions)
ENCODE
What it is: Encyclopedia of DNA Elements
Use for: Regulatory elements, transcription factors, chromatin states
Link: encodeproject.org
STRING
What it is: Protein-protein interaction networks
Use for: Understanding protein interactions, pathway analysis
Link: string-db.org
Reactome
What it is: Pathway database
Use for: Pathway analysis, understanding biological processes
Link: reactome.org
Data Repositories
Sequence Read Archive (SRA)
What it is: NCBI’s repository for raw sequencing data
Use for: Downloading/uploading sequencing data
Link: ncbi.nlm.nih.gov/sra
Tool: SRA Toolkit for downloading
European Nucleotide Archive (ENA)
What it is: European equivalent of SRA
Use for: Raw sequencing data (often faster downloads in Europe)
Link: ebi.ac.uk/ena
Gene Expression Omnibus (GEO)
What it is: Repository for gene expression and genomics data
Use for: Finding published expression datasets
Link: ncbi.nlm.nih.gov/geo
dbGaP (Database of Genotypes and Phenotypes)
What it is: Controlled-access archive for human genetic/phenotypic data
Use for: Accessing large-scale human genomics studies
Link: ncbi.nlm.nih.gov/gap
Access: Requires application through your institution
European Genome-phenome Archive (EGA)
What it is: European archive for sensitive human data
Use for: Controlled-access human genomics data
Link: ega-archive.org
Online Communities & Forums
Biostars
What it is: Q&A forum for bioinformatics
Best for: Getting help with specific analysis problems
Link: biostars.org
Tip: Search before posting - your question may already be answered!
SEQanswers
What it is: Forum focused on sequencing technologies and analysis
Best for: NGS-specific discussions, troubleshooting
Link: seqanswers.com
Reddit Communities
- r/bioinformatics: General bioinformatics discussion
reddit.com/r/bioinformatics - r/genomics: Genomics-focused
reddit.com/r/genomics - r/learnbioinformatics: For learning and education
reddit.com/r/learnbioinformatics
Twitter/X & Mastodon
Follow these tags/communities: - #bioinformatics - #genomics - #scicomm - #epitwitter (for epidemiology/public health)
Active bioinformatics community on Mastodon: - genomic.social (Mastodon instance for genomics researchers)
Software Repositories
Bioconda
What it is: Package repository for bioinformatics software
Why essential: Easy installation of bioinformatics tools
Link: bioconda.github.io
Usage: conda install -c bioconda toolname
Bioconductor
What it is: R packages for genomic data analysis
Why essential: Statistical analysis and visualization of genomics data
Link: bioconductor.org
Note: Over 2,000 packages for all types of genomic analyses
BioPython
What it is: Python tools for computational biology
Use for: Parsing biological file formats, sequence analysis
Link: biopython.org
Docs: Excellent tutorials available
GitHub Topics
Search these topics to find relevant bioinformatics tools: - bioinformatics - genomics - variant-calling - rna-seq
Learning Resources
Journals to Follow
Methods/Tools: - Nature Methods - Bioinformatics - Genome Biology - BMC Bioinformatics - GigaScience
Disease/Dementia-focused: - Alzheimer’s & Dementia - Neurology - Molecular Neurodegeneration - Acta Neuropathologica
Preprint Servers
bioRxiv
What it is: Preprint server for biology
Why follow: See cutting-edge research before peer review
Link: biorxiv.org
Tip: Set up alerts for topics of interest
medRxiv
What it is: Preprint server for medical sciences
Link: medrxiv.org
Blogs & Websites
The Bioinformatics CRO
What it is: Blog with practical bioinformatics tutorials
Link: thebioinformaticscro.co.uk
Living in an Ivory Basement
What it is: Titus Brown’s blog on bioinformatics and open science
Link: ivory.idyll.org/blog
EMBL-EBI Training Blog
What it is: Updates on courses and bioinformatics training
Link: ebi.ac.uk/training
OmicsTutorials
What it is: Comprehensive bioinformatics tutorials and guides
Link: omicstutorials.com
YouTube Channels
StatQuest with Josh Starmer - Excellent explanations of statistical concepts - Covers RNA-seq, PCA, machine learning - youtube.com/@statquest
EMBL-EBI Training - Webinars and training videos - youtube.com/@EBITraining
Computational Biology Core Leuven - Bioinformatics tutorials - youtube.com/@ComputationalBiologyCoreLeuven
Books
Bioinformatics Data Skills
Author: Vince Buffalo
Best for: Learning Unix, version control, and working with genomic data
Why read: Comprehensive practical guide to bioinformatics infrastructure
Bioinformatics Algorithms: An Active Learning Approach
Authors: Compeau & Pevzner
Best for: Understanding algorithms behind bioinformatics tools
Why read: Learn how tools actually work under the hood
Introduction to Genomics
Author: Arthur Lesk
Best for: Biological background on genomics
Why read: Understand the biology behind the data
RNA-seq Data Analysis: A Practical Approach
Editors: Eija Korpelainen et al.
Best for: Comprehensive RNA-seq analysis guide
Why read: End-to-end RNA-seq workflows with practical examples
Podcasts
The Bioinformatics Chat
What it is: Interviews with bioinformatics researchers and developers
Link: bioinformatics.chat
Great for: Staying current with methods and hearing about career paths
The Genetics Podcast
What it is: Genomics and genetics interviews
Topics: From GWAS to clinical genomics
Link: thegeneticspodcast.com
Professional Organizations
International Society for Computational Biology (ISCB)
What it is: Premier bioinformatics professional organization
Benefits: Conferences (ISMB, RECOMB), journals, student fellowships
Link: iscb.org
European Society of Human Genetics (ESHG)
What it is: Organization for human genetics professionals
Link: eshg.org
American Society of Human Genetics (ASHG)
What it is: Leading organization for human genetics
Link: ashg.org
Conferences
Major Bioinformatics Conferences
- ISMB (Intelligent Systems for Molecular Biology) - Annual, July
- RECOMB (Research in Computational Molecular Biology) - Annual, Spring
- ASHG (American Society of Human Genetics) - Annual, Fall
- ESHG (European Society of Human Genetics) - Annual, Spring
- Advances in Genome Biology and Technology (AGBT) - Annual, February
Dementia Research Conferences
- Alzheimer’s Association International Conference (AAIC) - Annual, Summer
- Clinical Trials on Alzheimer’s Disease (CTAD) - Annual, Fall
- AD/PD™ (Alzheimer’s and Parkinson’s Diseases) - Biennial
Cheat Sheets & Quick References
Command Line Cheat Sheets
Bioinformatics-Specific
Useful Tools & Utilities
File Format Converters
BAM/SAM/CRAM: SAMtools
BED/GFF/GTF: BEDtools, gffread
FASTQ quality encoding: seqtk
VCF/BCF: BCFtools
Genome Browsers (Local)
IGV (Integrative Genomics Viewer) - Visualize alignments, variants, annotations - software.broadinstitute.org/software/igv
JBrowse - Web-based genome browser - jbrowse.org
Visualization Tools
R packages: - ggplot2 (general plotting) - ggbio (genomic data) - ComplexHeatmap (heatmaps) - circlize (circular plots, genomic rearrangements)
Python: - matplotlib/seaborn (general plotting) - plotly (interactive plots) - pysam (BAM file visualization)
How to Stay Current
Daily Habits
- Set up Google Scholar alerts for key topics (e.g., “Alzheimer’s genomics”)
- Follow key researchers on Twitter/Mastodon
- Subscribe to journal TOC alerts (Table of Contents)
- Check bioRxiv weekly for preprints in your area
Weekly Habits
- Read 1-2 papers in depth (not just abstracts)
- Try a new tool or technique from tutorials
- Participate in forums - answer questions when you can
- Review your analysis catalogue and project progress
Monthly Habits
- Attend a webinar or virtual conference
- Deep dive into one new method or tool
- Update your skills with a new tutorial
- Write about what you learned (blog, notes, or share with lab)
Getting Help
Before Asking Questions
- Search first: Google, Biostars, Stack Overflow
- Check documentation: Tool manuals, GitHub issues
- Try debugging yourself: Error messages often have clues
- Isolate the problem: Minimal reproducible example
How to Ask Good Questions
Include: - What you’re trying to do - What you’ve tried already - Exact error messages (full output) - Software versions - Minimal code example - Expected vs. actual output
Good question template:
Title: SAMtools sort fails with "truncated file" error
I'm trying to sort a BAM file but getting an error.
Command:
samtools sort input.bam -o sorted.bam
Error:
[E::bgzf_read] Read 0 bytes from input.bam.bam; 7 expected
samtools sort: truncated file. Aborting
What I've tried:
- Checked file isn't corrupted (md5sum matches)
- Tried with different BAM files
- Reinstalled samtools
Environment:
- samtools version: 1.15
- OS: Ubuntu 20.04
- Input file size: 50GB
Any suggestions on what might be causing this?
Tips for Continuous Learning
- Learn by doing: Work with real data whenever possible
- Read the papers: Don’t just use tools - understand the methods
- Teach others: Best way to solidify your understanding
- Build a portfolio: Document your projects publicly
- Contribute: Report bugs, improve documentation, answer questions
- Network: Join local bioinformatics groups, attend meetups
- Stay humble: Bioinformatics is vast - no one knows everything
Conclusion
This toolkit provides a foundation, but bioinformatics is a journey of continuous learning. Use these resources, stay curious, and don’t hesitate to reach out to the community when you need help.
The best bioinformaticians are those who: - Document their work thoroughly - Ask questions when stuck - Share their knowledge - Keep learning new methods - Stay engaged with the community
Good luck on your bioinformatics journey! 🧬
Quick Links Summary
Start Here: - Home - Getting started guide - Tools & Setup - Essential software - Genomics Fundamentals - Core concepts - Tutorials & Workshops - Learning resources - Best Practices - Reproducible research
Essential Databases: - Ensembl - gnomAD - GWAS Catalog - PGS Catalog
Get Help: - Biostars - Stack Overflow - r/bioinformatics