Primary repository for the Data Engineering team at NYC Department of City Planning (DCP). We build and maintain geospatial and tabular data products for internal and external use.
Product metadata now lives in the product-metadata/ directory of this repo. The standalone NYCPlanning/product-metadata repository has been deprecated and archived.
| Path | Purpose |
|---|---|
dcpy/ |
Core Python package: lifecycle orchestration, connectors, utilities |
products/ |
One folder per data product — code, dbt models, recipe files, README |
product-metadata/ |
Dataset specifications (metadata.yml) for DCP products — copied from the former product-metadata repo |
ingest_templates/ |
YAML specs for extracting and archiving source datasets |
apps/ |
Docker Compose services: nginx reverse proxy, QA/QAQC Streamlit app (/qaqc), Dagster orchestration UI (/dag), marimo notebook server |
docs/ |
Technical reference (see below) |
experimental/ |
Sandbox for prototyping; not production code |
Each product lives under products/<name>/ and follows a standard pipeline from source data to public distribution:
Ingest → Build → Draft → QA → Publish
- Ingest — extract source datasets from APIs or files and archive to
edm-recipes(S3) - Build — load archived data into Postgres, run dbt/SQL transforms
- Draft — promote build output to the S3
draftfolder; run automated QA checks - QA — domain experts and GIS team review; address issues and rebuild as needed
- Publish — promote approved draft to the
publishfolder for distribution
For the full workflow including GIS team review and issue tracking conventions, see the Data Update Workflow wiki page.
See the Developer Setup wiki page for onboarding and the recommended Docker dev container. For manual (uv/venv) setup and Python dependency management, see docs/development.md.
- dbt project conventions — model layers, materialization, geometry standards, linting
- dcpy package structure — module layers and import rules
- dcpy architecture & import flow — layered dependency model +
tachenforcement - Test strategy — suites, how to run them, conventions
- Developer conventions — git/PR flow, formatting, comment tags
- SQL reference — Postgres/MSSQL query and admin snippets
- Local development — manual (uv/venv) setup and dependency management
- Bash scripts & CLI tools — available utilities on
PATH
The wiki covers team and operational content: About Us · Cloud Infrastructure · Data Catalog · Environment Management · Product pages
