š Checking Your GitHub Repo's Health with Kvatch
When you're maintaining an open-source project, it's useful to have a quick snapshot of your repository's "health." How many files are in the repo? How many issues are open vs closed? When was the first issue created, and what's the most recent one?
With Kvatch, we can treat a Git repo + the GitHub API as federated data sources and query them together as if they were just tables in a database.
š Step 1 ā Define the plan
Here's the plan.yaml that combines a Git repo with the GitHub Issues API:
name: github-repo-health
storage:
type: sqlite
data_store_path: github_repo_health.db
connectors:
- name: repo_git
type: GIT
connection:
repo: "https://github.com/kvatch-hub/kvatch-cli.git"
branch: "main"
path: "examples/git/data"
desc: "Cloned Git repo with CSV and source files"
- name: github_api
type: API
connection:
url: "https://api.github.com/repos/kvatch-hub/kvatch-cli/issues"
method: "GET"
desc: "GitHub API for repo issues"
datasets:
- name: repo_files
connector_name: repo_git
type: CSV
query: "users.csv"
- name: repo_issues
connector_name: github_api
type: JSON
options:
flatten_nested_objects: true
query: "$"
- name: repo_issue_summary
connector_name: federated
type: SQL
query: |
SELECT
state,
COUNT(*) AS issue_count,
MIN(created_at) AS first_issue_date,
MAX(created_at) AS last_issue_date
FROM repo_issues
GROUP BY state
children:
- dataset_name: repo_issues
- name: kvatch_repo
connector_name: federated
type: SQL
query: |
SELECT
(SELECT COUNT(*) FROM repo_files) AS total_files,
(SELECT COUNT(*) FROM repo_issues WHERE state = 'open') AS open_issues,
(SELECT COUNT(*) FROM repo_issues WHERE state = 'closed') AS closed_issues,
(SELECT MIN(created_at) FROM repo_issues) AS first_issue_date,
(SELECT MAX(created_at) FROM repo_issues) AS last_issue_date
children:
- dataset_name: repo_files
- dataset_name: repo_issues
output:
dataset_name: kvatch_repo
format: tableā¶ļø Step 2 ā Run the query
Run the plan with:
./kvatch query --plan github-repo-health.yaml
š Example output
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā total_files | open_issues | closed_issues | first_issue_date | last_issue_date āāāāāāāāāāāāāā¼āāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā 42 | 3 | 15 | 2024-08-01 | 2025-09-20 āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
š” Why this is useful
- Combine repo contents (via Git) and live GitHub API data.
- Get a single dashboard row summarizing project health.
- Works with any repo you care about ā just swap out the Git + GitHub API connectors.
This approach can be extended to track PRs, contributors, or even join GitHub data with financial or operational datasets.
š Next time, we'll take it further: analyzing issue churn and average time-to-close for your repo.
š Kvatch CLI is Now Available!
No more waiting! Download Kvatch CLI today and start federating your data sources.
Ready to start federating your data?
Download Kvatch CLI and start querying across all your data sources in minutes.