Glossary

Namespace

Atomic unit of the knowledge system containing data and metadata

  • defines (optionally)

    • tables

    • composite types

    • entity classes

  • can import other namespaces

  • a dataset or the output of a step in a project

  • has different environments

Dataset

A set of defined tables in a namespace with metadata and different environments

  • represented in one git repository

  • one namespace

Project

A pipeline built on datasets

  • represented in one git repository

  • steps of the pipeline create can namespaces

Metadata

Information describing the knowledge in a namespace

  • defined tables, composite types and entity classes

Artifact Metadata

  • imported namespaces, with prefix

  • metadata for all namespaces

Config

Parameters that can change from run to run

  • for a dataset

    • the environments to create

    • remotes to upload them to

  • for a project

    • the environments of the imported namespaces to use

    • parameters of the steps in the pipeline

Environment

A subset or a scrambled version of a set of data tables

  • changes by branch of a project

  • many on one branch of dataset, created based on config and script

  • defined by the environments of the sources for a project step

Feature

A named set of columns in a table

  • can be primitive feature, foreign key or composite feature

Subject of Records

Entity class that is represented in a table

Step

An element of the pipeline, collected in topmodules for a project and executed as one function with explicitly dtated outputs and dependencies

  • is logged in dvc

Topmodule

Python module that is a direct child of the root src module

Child Module

Module that is nested under a topmodule