Data Management
Data Management
The Leaderboard system uses a hybrid data storage approach for flexibility and maintainability.
Data Repository Structure
data-repo/
├── config.yaml # Configuration
├── theme.css # Custom theme (optional)
├── .leaderboard.db # LibSQL database
├── assets/ # Custom assets (optional)
│ ├── logo.png
│ ├── banner.jpg
│ └── icons/
│ └── favicon.ico
├── contributors/ # Contributor profiles (Markdown)
│ ├── alice.md
│ ├── bob.md
│ └── charlie.md
└── activities/ # Activity records (JSONL)
├── alice.jsonl
├── bob.jsonl
└── charlie.jsonlStorage Strategies
| Data Type | Format | Import | Export | Rationale |
|---|---|---|---|---|
| Contributors | Markdown + YAML | ✅ | ✅ | Human-editable profiles with rich bio content |
| Activity Definitions | Database only | ❌ | ❌ | Managed by plugins, no manual editing needed |
| Activities | Sharded JSONL | ✅ | ✅ | Efficient for large datasets, easy per-user updates |
| Assets | Static files | ✅ | N/A | Custom images and files served from data repository |
Custom Assets
The assets/ directory in your data repository allows you to serve custom static files (images, icons, etc.) alongside your leaderboard. This is useful for hosting logos, banners, favicons, and other assets without needing external hosting.
Setup
- Create an
assetsdirectory in your data repository:
mkdir -p data/assets- Add your files to the assets directory:
data/assets/
├── logo.png
├── banner.jpg
└── icons/
└── favicon.ico- The assets are automatically copied to
public/assets/during the build and dev process.
Usage
Reference assets in your config.yaml using the /assets/ path:
org:
name: My Organization
logo_url: /assets/logo.png # ← Served from data/assets/logo.png
meta:
title: My Leaderboard
image_url: /assets/banner.jpg # ← Served from data/assets/banner.jpg
favicon_url: /assets/icons/favicon.ico # ← Served from data/assets/icons/favicon.icoHow It Works
- During
pnpm devandpnpm build, thesetup:assetsscript automatically copies all files fromdata/assets/topublic/assets/in the web project - Files in
public/assets/are served as static assets at/assets/* - The
public/assets/directory is gitignored, so assets remain in your data repository only - You can organize assets in subdirectories (e.g.,
assets/icons/,assets/images/)
Benefits
- Self-contained: Keep all configuration and assets in your data repository
- Version control: Track asset changes in your data repository's git history
- No external dependencies: No need for external image hosting or CDN for simple assets
- Flexible paths: Use relative paths (
/assets/*) in your config instead of absolute URLs
Example
# config.yaml
org:
name: Acme Corp
logo_url: /assets/logo.svg
url: https://acme.com
meta:
title: Acme Contributors
image_url: /assets/og-image.png
site_url: https://contributors.acme.com
favicon_url: /assets/favicon.icoContributor Profiles
Format
Markdown files with YAML frontmatter:
---
username: alice
name: Alice Smith
role: core
title: Senior Engineer
avatar_url: https://github.com/alice.png
social_profiles:
github: https://github.com/alice
linkedin: https://linkedin.com/in/alice
joining_date: 2020-03-15
meta:
team: backend
timezone: PST
---
Alice is a senior engineer specializing in backend systems and distributed computing.
She has been contributing to the project since its inception and leads the infrastructure team.
## Expertise
- Kubernetes
- PostgreSQL
- High-performance system designSchema
Frontmatter fields:
username(string, required) - Unique identifiername(string) - Full namerole(string) - Role identifier (must matchconfig.yaml)title(string) - Job title or designationavatar_url(string) - Profile picture URLsocial_profiles(object) - Social media linksjoining_date(string, YYYY-MM-DD) - Join datemeta(object) - Custom metadata
Content: Bio in Markdown format
Manual Editing
Contributors can be added or edited manually:
- Create/edit file in
contributors/<username>.md - Write frontmatter and bio
- Commit and push changes
- Next build will include updates
Activities
Format
JSON Lines (JSONL), one file per contributor:
{"slug":"alice-pr-123","contributor":"alice","activity_definition":"pr_merged","title":"Fix auth bug","occurred_at":"2024-01-15T10:30:00Z","link":"https://github.com/org/repo/pull/123","points":10}
{"slug":"alice-issue-45","contributor":"alice","activity_definition":"issue_opened","title":"Add rate limiting","occurred_at":"2024-01-16T14:20:00Z","link":"https://github.com/org/repo/issues/45","points":5}Schema
Each line is a JSON object with:
slug(string, required) - Unique identifiercontributor(string, required) - Usernameactivity_definition(string, required) - Activity type slugtitle(string) - Activity titleoccurred_at(string, required) - ISO 8601 timestamplink(string) - URL to activitytext(string) - Additional text/descriptionpoints(number) - Points awardedmeta(object) - Custom metadata
Sharding
Activities are sharded by contributor for:
- Easier updates: Only update affected user's file
- Better performance: Parallel import/export
- Smaller diffs: Git history is cleaner
- Scalability: Works with thousands of contributors
Activity Definitions
Format
SQLite table in .leaderboard.db:
CREATE TABLE activity_definition (
slug VARCHAR PRIMARY KEY,
name VARCHAR NOT NULL,
description TEXT NOT NULL,
points SMALLINT,
icon VARCHAR
);Management
Activity definitions are:
- Created by plugins during
setup()phase - Persisted in the database file
- Not exported to separate files
- Recreated if database is deleted
Example:
await ctx.db.execute(`
INSERT OR IGNORE INTO activity_definition
(slug, name, description, points, icon)
VALUES
('pr_merged', 'PR Merged', 'Pull request was merged', 10, 'git-merge')
`);Database
LibSQL
Technology: SQLite-compatible database with modern features
Location: ${DATA_DIR}/.leaderboard.db
Purpose:
- Primary data store during build process
- Efficient querying for SSG
- Persistent storage for activity definitions
Access
The database is accessed by:
- Plugin Runner: Write data during scraping
- Next.js Build: Read data for SSG
- Data Explorer: Browser-based SQL REPL via WebAssembly (read-only, end-user facing)
Connection
import { createDatabase } from "@ohcnetwork/leaderboard-api";
const db = createDatabase("file:./data/.leaderboard.db");Data Lifecycle
Import Phase
Process:
- Read all Markdown files from
contributors/ - Parse YAML frontmatter and content
- Insert into
contributortable - Read all JSONL files from
activities/ - Parse each line as JSON
- Insert into
activitytable
Plugin Phase
Process:
- Execute
setup()for all plugins (inserts config + plugin badge definitions) - Plugins populate
activity_definitiontable - Execute
scrape()for all plugins - Plugins fetch data from APIs
- Plugins insert into
activitytable - Main leaderboard aggregation runs (standard aggregates)
- Execute
aggregate()for plugins that define it - Plugins compute custom aggregates
- Evaluate config badge rules
- Evaluate plugin badge rules
Export Phase
Process:
- Read all contributors from database
- Serialize to Markdown with frontmatter
- Write to
contributors/<username>.md - Read all activities from database
- Group by contributor
- Serialize to JSONL
- Write to
activities/<username>.jsonl
Build Phase
Process:
- Next.js reads from LibSQL database
- Generate static pages with data
- Output HTML/CSS/JS files
Backup & Recovery
Full Backup
# Backup entire data repository
tar -czf data-backup-$(date +%Y%m%d).tar.gz data-repo/Selective Backup
# Backup only contributors
tar -czf contributors-$(date +%Y%m%d).tar.gz data-repo/contributors/
# Backup only activities
tar -czf activities-$(date +%Y%m%d).tar.gz data-repo/activities/Recovery
# Restore from backup
tar -xzf data-backup-20240101.tar.gzGit History
Since data is in Git:
# Revert to previous state
git checkout HEAD~1 contributors/alice.md
# View history
git log --follow contributors/alice.mdData Migration
From Legacy Format
If migrating from a different system:
// migration.js
import {
createDatabase,
contributorQueries,
} from "@ohcnetwork/leaderboard-api";
import { exportContributors } from "@leaderboard/plugin-runner";
const db = createDatabase(":memory:");
// Import legacy data
for (const legacyUser of legacyData) {
await contributorQueries.upsert(db, {
username: legacyUser.login,
name: legacyUser.name,
// ... map fields
});
}
// Export to new format
await exportContributors(db, "./data-repo", logger);Between Formats
To change activity format:
- Import activities to database
- Transform in database
- Export to new format
Data Validation
Automatic Validation
The plugin runner validates:
- Markdown frontmatter structure
- JSON validity in JSONL files
- Foreign key references
- Required fields
Manual Validation
Check contributor profiles:
# Check for missing usernames
grep -L "^username:" contributors/*.md
# Validate YAML frontmatter
for f in contributors/*.md; do
yq e '.' "$f" > /dev/null || echo "Invalid: $f"
doneCheck activity files:
# Validate JSON lines
for f in activities/*.jsonl; do
jq empty "$f" 2>/dev/null || echo "Invalid JSON in: $f"
donePerformance Considerations
Import Performance
- Parallel processing: Contributors and activities imported concurrently
- Streaming: Large JSONL files processed line-by-line
- Batch inserts: Multiple activities inserted in transactions
Export Performance
- Sharded output: One file per contributor enables parallel export
- Incremental updates: Only modified files written
- Efficient queries: Database indexes optimize queries
Database Size
Typical sizes:
- Contributors: ~1 KB per contributor
- Activities: ~200 bytes per activity
- Database overhead: ~100 KB
Example: 100 contributors × 1000 activities = ~20 MB database
Best Practices
- Commit frequently: Small commits easier to review
- Use branches: Test changes before merging to main
- Validate before commit: Check data integrity
- Backup regularly: Automate backups in CI/CD
- Monitor database size: Archive old activities if needed
- Document custom fields: Explain
metafield usage