Components

Component basics

There are a few basic components that underpin rail_pz_service functionality.

Generating a Request requires CatalogTag that specifies which columns to expect in the Dataset that the request in being run on, and an Estimator to run the analysis using a Model trained for the Algorithm that underlies that Estimator running the request, and which is compatible with the CatalogTag of the Dataset.

Request

class rail_pz_service.db.Request(**kwargs)[source]

Basic processing unit in rail_pz_service. A Request to generate per-galaxy p(z) for all of the object in a particular Dataset using specific Estimator.

The output p(z) distribution will be stored in a qp file.

This also store some metadata including timestamps and the user who intiated the Request.

pydantic_mode_class: alias of Request

id: Mapped[int]: primary key

user: Mapped[str]: User who orginated this Request

estimator_id: Mapped[int]: foreign key into estimator table

dataset_id: Mapped[int]: foreign key into dataset table

qp_file_path: Mapped[str | None]: path to the output file

time_created: Mapped[datetime]: timestamp of when the request was created in the DB

time_started: Mapped[datetime | None]: timestamp of when the request processing started by an Estimator

time_finished: Mapped[datetime | None]: timestamp of when the request processing was finished

estimator_: Mapped[Estimator]: Access to associated Estimator

dataset_: Mapped[Dataset]: Access to associated Dataset

col_names_for_table = ['id', 'user', 'estimator_id', 'dataset_id', 'qp_file_path']: column names to use when printing the table

Dataset

class rail_pz_service.db.Dataset(**kwargs)[source]

Color data about set of objects that can be used to obtain p(z) esimates.

It is asscoated with a CatalogTag that defines which columns names to expect.

It can either be stored in a file (for larger datasets) or as a python dict (for small datasets of a few objects, useful when uploading things on the fly

pydantic_mode_class: alias of Dataset

id: Mapped[int]: primary key

name: Mapped[str]: Name for this Dataset, unique

n_objects: Mapped[int]: Number of objects in the dataset

path: Mapped[str | None]: Path to the relevant file (could be None)

data: Mapped[dict | None]: Data for the dataset (could be None)

catalog_tag_id: Mapped[int]: foreign key into catalog_tag table

catalog_tag_: Mapped[CatalogTag]: Access to associated CatalogTag

requests_: Mapped[list[Request]]: Access to list of associated Request

col_names_for_table = ['id', 'name', 'n_objects', 'catalog_tag_id', 'path']: column names to use when printing the table

classmethod validate_data_for_path(path, catalog_tag)[source]

Validate that these data are appropriate for the CatalogTag

Parameters:

path (Path) – File with the data
catalog_tag (CatalogTag) – CatalogTag in question

Returns:

Size of the datset

Return type:

int

classmethod validate_data(data, catalog_tag)[source]

Validate that these data are appropriate for the CatalogTag

Parameters:

data (dict) – Data in question
catalog_tag (CatalogTag) – Catalog tab in question

Returns:

Size of the datset, data formatted as strings

Return type:

tuple[int, dict[str, list[float]]]

Estimator

class rail_pz_service.db.Estimator(**kwargs)[source]

Combinination of an Algorithm to run a trained Model to apply to the data, and any specific configuration overrides.

pydantic_mode_class: alias of Estimator

id: Mapped[int]: primary key

name: Mapped[str]: Name of the model, unique

algo_id: Mapped[int]: foreign key into ‘Algorithm’ table

catalog_tag_id: Mapped[int]: foreign key into ‘CatalogTag’ table

model_id: Mapped[int]: foreign key into ‘Model’ table

config: Mapped[dict | None]: Configuration parameters for this estimator

algo_: Mapped[Algorithm]: Access to associated Algorithm

catalog_tag_: Mapped[CatalogTag]: Access to associated CatalogTag

model_: Mapped[Model]: Access to associated Model

requests_: Mapped[list[Request]]: Access to list of associated Request

col_names_for_table = ['id', 'name', 'algo_id', 'catalog_tag_id', 'model_id']: column names to use when printing the table

Model

class rail_pz_service.db.Model(**kwargs)[source]

Specific ML model that is trained to work with a specific Algorithm. On a particular type of data (CatalogTag)

Typically a Model is stored as a pickle file.

The rail.core.model.Model class provides a standard wrapper to store meta data such as the name of the python class that created the model, and the applicable CatalogTag to use the model with.

pydantic_mode_class: alias of Model

id: Mapped[int]: primary key

name: Mapped[str]: Name for this Model, unique

path: Mapped[str]: Path to the relevant file

algo_id: Mapped[int]: foreign key into Algorithm table

catalog_tag_id: Mapped[int]: foreign key into CatalogTag table

algo_: Mapped[Algorithm]: Access to associated Algorithm

catalog_tag_: Mapped[CatalogTag]: Access to associated CatalogTag

estimators_: Mapped[list[Estimator]]: Access to list of associated Estimator

col_names_for_table = ['id', 'name', 'algo_id', 'catalog_tag_id', 'path']: column names to use when printing the table

classmethod validate_model(path, algo, catalog_tag)[source]

Validate that the model is appropriate for the Algorithm and CatalogTag

Parameters:

path (Path) – File with the data
algo (Algorithm) – Algorithm in question
catalog_tag (CatalogTag) – Catalog tag in question

Return type:

None

Algorithm

class rail_pz_service.db.Algorithm(**kwargs)[source]

Algorithm is wrapper for a specific RAIL class that implements a particular p(z) estimation algorithm.

This just defines the particular python class implementing the algorithm. The selection of a particular instance of the training Model and any non-default a parameters used to initialze an Estimator are handled in their own classes.

pydantic_mode_class: alias of Algorithm

id: Mapped[int]: primary key

name: Mapped[str]: Name for this Algorithm, unique

class_name: Mapped[str]: Name for the python class implementing the algorithm

estimators_: Mapped[list[Estimator]]: Access to list of associated Estimator

models_: Mapped[list[Model]]: Access to list of associated Model

col_names_for_table = ['id', 'name', 'class_name']: column names to use when printing the table

CatalogTag

class rail_pz_service.db.CatalogTag(**kwargs)[source]

Defines what kind of catalog we are analyzing data from. Specifically what to expect for the names of the magnitude columns.

This is implemented in the rail.utils.catalog_utils module, which uses a catalog tag to set the default parameters for RAIL modules to match the catalog.

pydantic_mode_class: alias of CatalogTag

id: Mapped[int]: primary key

name: Mapped[str]: Name for this CatalogTag, unique

class_name: Mapped[str]: Name for the python class implementing the CatalogTag

estimators_: Mapped[list[Estimator]]: Access to list of associated Estimator

models_: Mapped[list[Model]]: Access to list of associated Model

datasets_: Mapped[list[Dataset]]: Access to list of associated Dataset

col_names_for_table = ['id', 'name', 'class_name']: column names to use when printing the table