Reproducible Project Structure
A simple, consistent way to organise bioinformatics work from day 1
Why Consistency Matters
A consistent project structure makes work easier to understand, review, share, and continue. It reduces the time spent guessing where files live or which script produced which result.
This follows the same general logic used in our GitHub reproducibility session and the GIN-tonic approach: keep projects intentional, keep data handling explicit, and separate raw inputs from derived outputs and working code.
A GIN-Tonic-Style Project Layout
project_name/
├── README.md
├── 01-Project_management/
├── 02_material_methods/
├── 03_data/
│ ├── 001_rawdata/
│ └── 990_processed_data/
├── 04_data_analysis/
├── 05_figures/
├── 06_dissemination/
└── 07_miscellaneous/
You can extend this if needed, but the default layout should stay predictable and the top-level numbering should remain stable.
What Belongs in Each Folder
README.md
The front door to the project. State the purpose of the repository, who maintains it, where the data lives, and how the repository is organised.
01-Project_management/
Project planning material, timelines, meeting notes, analysis logs, and administrative context.
02_material_methods/
Methods notes, protocol information, sample descriptions, and other material needed to understand how the work is being done.
03_data/001_rawdata/
Original input data. Treat this as read-only. Do not manually edit files here.
03_data/990_processed_data/
Derived data created from raw inputs. Anything here should be reproducible from documented steps.
04_data_analysis/
Analysis code, scripts, and notebooks. If there is a clear execution order, use numbered file names such as 001_cleaning.R, 002_model.R, or 003_summary.qmd.
05_figures/
Generated outputs such as tables, figures, reports, and summary files.
06_dissemination/
Material prepared for sharing outwardly, such as presentation files, manuscript-supporting material, or polished outputs.
07_miscellaneous/
Environment and configuration information such as environment.yml, package lists, container definitions, small helper files, and project extras that do not fit elsewhere.
Practical Rules
- keep the top-level numbered folders unchanged once a project starts
- treat
03_data/001_rawdata/as read-only - keep generated data separate from source data
- keep analysis code in
04_data_analysis/ - capture environment and configuration information early in
07_miscellaneous/ - document any departures from the standard structure in the README
Why This Helps
This kind of structure improves:
- reproducibility, because inputs, code, and outputs are clearly separated
- onboarding, because new starters can navigate a repository quickly
- collaboration, because other people can understand where to add or find material
- continuity, because the project does not depend on one person’s memory
Why This Helps a New Starter
project_name/
├── README.md
├── 01-Project_management/
├── 02_material_methods/
├── 03_data/
│ ├── 001_rawdata/
│ └── 990_processed_data/
├── 04_data_analysis/
│ ├── 001_cleaning.R
│ ├── 002_model.R
│ └── 003_summary.qmd
├── 05_figures/
├── 06_dissemination/
└── 07_miscellaneous/
├── environment.yml
└── config.yml
Someone new can usually answer the key questions quickly:
- where do I start?
- which files are raw inputs?
- where does the analysis code live?
- where are outputs written?
- where is the software environment recorded?
Relationship to GitHub
Not every file in a project belongs in GitHub, especially large or controlled data. The important thing is that the repository explains:
- what data was used
- where that data lives
- what scripts were run
- where outputs are written
- what someone else would need in order to rerun or extend the work
Start Simple
The goal is not to create a perfect structure up front. The goal is to start with a sensible default and avoid avoidable chaos.
If you are beginning a new repository, use Start Here first and then apply this structure immediately.
Do not wait until old projects are tidy before adopting this approach. The most useful change is to apply it to the next project you start.