Tutorial
Getting Started with LDA¶
This tutorial will walk you through installing Linked Document Analysis (LDA) and creating your first project.
Prerequisites¶
- Python 3.9 or higher
- uv package manager
- Basic familiarity with command line
Installation¶
Install LDA using uv:
Verify the installation:
Creating Your First Project¶
Basic Initialization¶
Create a new LDA project with a descriptive name:
This creates: - A project folder: climate_analysis_2024/ - Configuration file: climate_analysis_2024_config.yaml - Playground directory: lda_playground/ - Project manifest for tracking
Initialization with Sections¶
Create a project with predefined sections:
Each section includes: - Input/output directories - Logs directory - Run script (run.py by default) - README file
Multi-Language Support¶
Generate both Python and R scripts:
Options: - python: Python scripts only (default) - r: R scripts only
- both: Both Python and R scripts
Minimal Projects¶
Create a project without sections or playground:
This creates just the basic structure without predefined sections.
Working with Projects¶
Check Project Status¶
View project information and status:
Shows: - Project metadata - Section list - File counts - Last activity
Adding Sections Later¶
You can add sections after initialization using the sync command:
- Edit your project's config file (e.g.,
climate_analysis_2024_config.yaml):
sections:
- name: introduction
inputs: []
outputs: []
- name: methodology # New section
inputs: []
outputs: []
- Sync the project structure:
This creates the new section with all standard directories and scripts.
Sync with Dry Run¶
Preview changes before applying them:
Project Structure¶
A typical LDA project looks like:
climate_analysis_2024/
├── climate_analysis_2024_config.yaml
├── climate_analysis_2024/
│ ├── lda_manifest.csv
│ ├── lda_playground/
│ │ ├── experiments/
│ │ ├── scratch/
│ │ ├── notebooks/
│ │ └── example_exploration.py
│ ├── climate_analysis_2024_sec01_introduction/
│ │ ├── README.md
│ │ ├── inputs/
│ │ ├── outputs/
│ │ ├── logs/
│ │ ├── run.py
│ │ └── run.R (if language=both)
│ └── climate_analysis_2024_sec02_methodology/
│ └── ... (same structure)
Configuration Files¶
Configuration files are named after your project: - Project: "Climate Analysis 2024" - Config: climate_analysis_2024_config.yaml
Key configuration options:
project:
name: Climate Analysis 2024
code: climate_analysis_2024
analyst: Your Name
create_playground: true
language: python
sections:
- name: data_prep
inputs: []
outputs: []
- name: analysis
inputs: []
outputs: []
The Playground¶
The lda_playground directory is for exploratory work:
experiments/: Try out analysis approachesscratch/: Temporary worknotebooks/: Jupyter notebooks- Example scripts in your chosen language(s)
Use it to test ideas before formalizing them into sections.
Working with Sections¶
Each section represents a distinct analysis phase:
-
Navigate to a section:
-
Add input files to
inputs/ -
Edit the run script (
run.pyorrun.R) -
Execute the analysis:
-
Results are saved to
outputs/
Tracking Files¶
Register files in the manifest:
ldanalysis track data.csv --section data_prep --type input
ldanalysis track results.png --section analysis --type output
View tracked changes:
Best Practices¶
- Use descriptive project names: They become folder and config names
- Plan sections upfront: Map to document sections
- Start in playground: Test approaches before formalizing
- Track important files: Maintain provenance
- Use sync for updates: Add sections as needed
- Document your work: Update READMEs in each section
Next Steps¶
- Explore the User Guide for detailed concepts
- Read about Configuration options
- Learn about File Tracking
- Understand Project Syncing
Quick Reference¶
# Install
uv tool install ldanalysis
# Create project
ldanalysis init --name "My Project"
# With sections
ldanalysis init --name "My Project" --sections "intro,methods,results"
# With R support
ldanalysis init --name "My Project" --language r
# Both languages
ldanalysis init --name "My Project" --language both
# No playground
ldanalysis init --name "My Project" --no-playground
# Check status
ldanalysis status
# Sync structure
ldanalysis sync
# Dry run
ldanalysis sync --dry-run
# Track files
ldanalysis track file.csv --section intro --type input
# View changes
ldanalysis changes