Concepts

Core Concepts¶

Understanding LDA's core concepts will help you get the most out of the tool. This guide explains the fundamental ideas behind Linked Document Analysis.

What is Linked Document Analysis?¶

Linked Document Analysis (LDA) is a methodology for managing complex projects where documents, data, and code are interconnected. It provides:

Structural Organization: Logical grouping of related files
Change Tracking: Monitoring modifications across all project files
Relationship Mapping: Understanding connections between documents
Historical Preservation: Complete audit trail of project evolution

graph LR
    A[Documents] <--> B[Data]
    B <--> C[Code]
    C <--> D[Results]
    D <--> A
    E[LDA] --> A
    E --> B
    E --> C
    E --> D

Key Concepts¶

1. Projects¶

A project is the top-level container for your work. Each project has:

Unique identifier (project.code)
Descriptive name (project.name)
Configuration file (lda_config.yaml)
Tracking directory (.lda/)

project:
  name: "Climate Research"
  code: "CLIMATE-2024"
  author: "Dr. Smith"
  description: "Analyzing temperature trends"

2. Sections¶

Sections are logical groupings of related files within your project. Think of them as chapters in a book or modules in a system.

sections:
  literature:
    name: "Literature Review"
    type: "documentation"
    files:
      - "docs/papers/*.pdf"
      - "docs/notes/*.md"

  analysis:
    name: "Data Analysis"
    type: "code"
    files:
      - "scripts/*.py"
      - "notebooks/*.ipynb"

Section Types¶

documentation: Text documents, reports, notes
data: Raw and processed data files
code: Scripts, programs, notebooks
outputs: Results, figures, generated content
resources: Supporting materials, references

3. File Tracking¶

LDA tracks files using multiple methods:

Content Hashing¶

SHA-256 hash of file contents
Detects even minor changes
Ensures file integrity

Metadata Tracking¶

File size
Modification time
Permissions
Custom attributes

tracked_file = {
    "path": "data/experiment_001.csv",
    "hash": "a34bf12c...",
    "size": 1024576,
    "modified": "2024-01-20T10:30:00",
    "section": "data",
    "metadata": {
        "experiment_id": "EXP-001",
        "researcher": "Dr. Smith"
    }
}

4. The Manifest¶

The manifest is LDA's central registry, stored in .lda/manifest.yaml. It contains:

Project metadata
Section definitions
File tracking information
Change history
Relationship mappings

manifest:
  version: "1.0"
  created: "2024-01-01"
  project:
    name: "Climate Research"
    code: "CLIMATE-2024"

  sections:
    documentation:
      files: 15
      last_modified: "2024-01-20"

    data:
      files: 42
      last_modified: "2024-01-19"

  tracking:
    total_files: 57
    total_size: 134217728  # 128 MB

5. Change Detection¶

LDA continuously monitors your project for changes:

Types of Changes¶

Added: New files created
Modified: Existing files changed
Deleted: Files removed
Moved: Files relocated
Renamed: Files renamed

Change Attributes¶

change:
  type: "modified"
  file: "docs/protocol.md"
  timestamp: "2024-01-20T14:30:00"
  size_before: 1024
  size_after: 1536
  hash_before: "abc123..."
  hash_after: "def456..."

6. Relationships¶

LDA maps relationships between files:

graph TD
    A[protocol.md] -->|references| B[data/exp001.csv]
    A -->|cites| C[literature/smith2023.pdf]
    B -->|processed by| D[scripts/analyze.py]
    D -->|generates| E[results/figure1.png]
    E -->|included in| F[report.md]

Types of relationships: - References: Document cites another - Depends on: Code requires data file - Generates: Script produces output - Includes: Document embeds content - Links to: Explicit connections

7. Workflows¶

Workflows define how your project evolves:

workflow:
  phases:
    - planning
    - data_collection
    - analysis
    - writing
    - review
    - publication

  current_phase: "analysis"

  rules:
    - "Data must be validated before analysis"
    - "All code must be reviewed"
    - "Documentation required for each phase"

8. Templates¶

Templates provide starting points for common project types:

template: research
features:
  - structured_sections
  - citation_tracking
  - experiment_logging
  - result_validation

Available templates: - research: Academic research projects - software: Software development - documentation: Technical writing - data-science: ML/AI projects - minimal: Basic structure

9. Configuration¶

LDA uses hierarchical configuration:

System defaults: Built-in settings
User config: ~/.ldarc
Project config: lda_config.yaml
Environment: LDA_* variables
Command line: Runtime options

# Project configuration
tracking:
  interval: 300  # 5 minutes
  ignore_patterns:
    - "*.tmp"
    - ".DS_Store"

display:
  theme: "modern"
  verbose: true

export:
  formats: ["html", "pdf", "json"]
  include_metadata: true

10. Export Formats¶

LDA can export project information in various formats:

Format	Use Case	Features
HTML	Web viewing	Interactive, styled
PDF	Printing/sharing	Formatted, portable
JSON	Integration	Machine-readable
CSV	Spreadsheets	Tabular data
Markdown	Documentation	Version control friendly

Architecture Overview¶

graph TB
    subgraph "LDA Core"
        A[Configuration Manager]
        B[File Tracker]
        C[Manifest Handler]
        D[Change Detector]
        E[Relationship Mapper]
    end

    subgraph "User Interface"
        F[CLI Commands]
        G[Python API]
        H[Web Dashboard]
    end

    subgraph "Storage"
        I[File System]
        J[.lda Directory]
        K[Config Files]
    end

    F --> A
    G --> A
    H --> A
    A --> B
    A --> C
    B --> D
    D --> E
    B --> I
    C --> J
    A --> K

Best Practices¶

Project Organization¶

Logical Sections: Group related files
Clear Naming: Use descriptive names
Consistent Structure: Follow patterns
Regular Commits: Version control integration

Configuration¶

Start Simple: Use defaults initially
Customize Gradually: Add as needed
Document Changes: Comment your config
Use Templates: Leverage existing patterns

Workflow¶

Regular Tracking: Run lda track frequently
Review Changes: Check modifications daily
Export Reports: Create regular summaries
Backup Manifest: Preserve project state

Advanced Concepts¶

Custom Plugins¶

Extend LDA with custom functionality:

from lda.plugin import Plugin

class CustomAnalyzer(Plugin):
    def analyze(self, project):
        # Custom analysis logic
        pass

Integration APIs¶

from lda import Project

# Load project
project = Project(".")

# Get file information
file_info = project.get_file("data/results.csv")

# Detect changes
changes = project.get_changes(since="1 hour ago")

# Export report
project.export(format="html", output="report.html")

Performance Optimization¶

For large projects:

Selective Tracking: Track only relevant files
Cached Operations: Enable caching
Parallel Processing: Use multiple cores
Incremental Updates: Track changes efficiently

Next Steps¶

Now that you understand the concepts:

- :material-cog:{ .lg .middle } __Configuration__ --- Detailed configuration options [:octicons-arrow-right-24: Configure](configuration.md) - :material-file-multiple:{ .lg .middle } __Templates__ --- Using project templates [:octicons-arrow-right-24: Explore](templates.md) - :material-workflow:{ .lg .middle } __Workflows__ --- Common workflow patterns [:octicons-arrow-right-24: Learn](workflows.md)