Glossary ======== Namespace ^^^^^^^^^ Atomic unit of the knowledge system containing data and metadata - defines (optionally) - tables - composite types - entity classes - can import other namespaces - a dataset or the output of a step in a project - has different environments Dataset ^^^^^^^ A set of defined tables in a namespace with metadata and different environments - represented in one git repository - one namespace Project ^^^^^^^ A pipeline built on datasets - represented in one git repository - steps of the pipeline create can namespaces Metadata ^^^^^^^^ Information describing the knowledge in a namespace - defined tables, composite types and entity classes Artifact Metadata ^^^^^^^^^^^^^^^^^ - imported namespaces, with prefix - metadata for all namespaces Config ^^^^^^ Parameters that can change from run to run - for a dataset - the environments to create - remotes to upload them to - for a project - the environments of the imported namespaces to use - parameters of the steps in the pipeline Environment ^^^^^^^^^^^ A subset or a scrambled version of a set of data tables - changes by branch of a project - many on one branch of dataset, created based on config and script - defined by the environments of the sources for a project step Feature ^^^^^^^ A named set of columns in a table - can be primitive feature, foreign key or composite feature Subject of Records ^^^^^^^^^^^^^^^^^^ Entity class that is represented in a table Step ^^^^ An element of the pipeline, collected in topmodules for a project and executed as one function with explicitly dtated outputs and dependencies - is logged in dvc Topmodule ^^^^^^^^^ Python module that is a direct child of the root src module Child Module ^^^^^^^^^^^^ Module that is nested under a topmodule