Data Loading¶

A class for loading data for use in other Ionworks packages.

Bases: object

Unified data loader for time-series and OCP data.

Handles two modes:

With steps: loads time-series data with step information for simulation, experiment generation, etc.
Without steps: loads simple tabular data (e.g. OCP curves with Capacity and Voltage columns).

Post-load preprocessing is configured via the transforms dict option.

The data and steps attributes return Polars DataFrames. Use loader.data.to_pandas() or loader.steps.to_pandas() for pandas. Constructors and property setters accept both pandas and Polars; inputs are converted to Polars internally.

Parameters¶

time_seriespd.DataFrame | pl.DataFrame | dict

The data to load. Can be a Pandas/Polars DataFrame or a dict.

stepspd.DataFrame | pl.DataFrame | dict | None, optional

Step information. When None, the loader operates in simple (no-steps) mode.

**kwargs

Options passed directly or via an options dict. Supported keys:

When steps are provided:

first_step, last_step : str | int

first_step_dict, last_step_dict : dict (deprecated)

Always:

capacity_columnstr
Name of the column in time_series to use as the capacity axis (copied to "Capacity [A.h]").

transformsdict with any of:

gitt_to_ocpbool
See transform_gitt_to_ocp() for details.

rest_to_ocpbool
See transform_rest_to_ocp() for details.

sortbool
See sort_capacity_and_ocp() for details.

remove_duplicatesbool
See remove_duplicate_ocp() for details.

remove_extremesbool
See remove_ocp_extremes() for details.

filtersdict
See filter_data() for details.

interpolatefloat | np.ndarray
See interpolate_data() for details.

keep_first_ocp_pointbool
If True, prepend the first point (see transform_gitt_to_ocp() and transform_rest_to_ocp()). Default False. Ignored if gitt_to_ocp and rest_to_ocp are both False.

__init__(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)

__getitem__(key: str) → Series

generate_experiment(use_cv: bool = False) → Experiment: Generate a PyBaMM experiment from the loaded step information.

plot_data(show: bool = False) → tuple[Figure, Axes]: Plot voltage vs time data from the loaded experiment.

static filter_data(data: DataFrame, filters: dict) → DataFrame

Filter a Polars DataFrame using the specified filter functions.

Each key in filters is a column name; the value must include filter_type (e.g. "savgol") and any parameters. Used by the filters transform option.

static interpolate_data(data: DataFrame, knots: float | ndarray, x_column: str = 'Time [s]') → DataFrame

Interpolate a Polars DataFrame using np.interp.

If knots is a float, data is resampled at that regular interval along x_column. If an array, interpolated at those knots. Used by the interpolate transform option.

Only numeric columns are interpolated. Non-numeric (e.g. Utf8/String) columns are skipped and do not appear in the returned DataFrame.

classmethod from_local(data_path, options=None, use_polars=True)

Load data from local filesystem.

Parameters¶

data_pathstr: Path to the directory containing time_series.csv and optionally steps.csv files.
optionsdict | None, optional: Options to pass to the DataLoader constructor.
use_polarsbool, optional: If True (default), read CSV with Polars. If False, read with Pandas (data is still stored as Polars internally).

Returns¶

DataLoader

__getitem__(key: str) → Series¶

__init__(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)¶

calculate_dQdU_cutoff(method: str = 'explicit', show_plot: bool = False, options: dict | None = None) → float¶

Calculate the cut-off for dQdU based on the data.

Parameters¶

methodstr, optional: Method to use for calculating the cut-off. Options are: - “explicit” (default): Uses explicit method based on data range - “quantile”: Uses quantile-based method - “peaks”: Uses peak detection method
show_plotbool, optional: Whether to show a plot of the dQdU values with the cut-off.
optionsdict, optional: Dictionary of options to pass to the method.

Returns¶

float: Cut-off for dQdU

calculate_dUdQ_cutoff(method: str = 'explicit', show_plot: bool = False, options: dict | None = None) → float¶

Calculate the cut-off for dUdQ based on the data.

Parameters¶

methodstr, optional: Method to use for calculating the cut-off. Options are: - “explicit” (default): Uses explicit method based on data range - “quantile”: Uses quantile-based method - “peaks”: Uses peak detection method
show_plotbool, optional: Whether to show a plot of the dUdQ values with the cut-off.
optionsdict, optional: Dictionary of options to pass to the method.

Returns¶

float: Cut-off for dUdQ

copy() → DataLoader¶: Create a copy of the DataLoader instance.

property data: DataFrame¶: Time-series data as a Polars DataFrame. Use .data.to_pandas() for pandas.

property end_idx: int¶

static filter_data(data: DataFrame, filters: dict) → DataFrame¶

Filter a Polars DataFrame using the specified filter functions.

Each key in filters is a column name; the value must include filter_type (e.g. "savgol") and any parameters. Used by the filters transform option.

classmethod from_db(measurement_id: str, options: dict | None = None, use_cache: bool = True, client=None) → DataLoader¶

Load data from the Ionworks database (lazy loading).

Data is fetched on demand: accessing .initial_voltage or .steps loads only the steps table (small payload) via client.cell_measurement.steps(measurement_id). Accessing .data loads the time series as well via client.cell_measurement.time_series(measurement_id), after steps if needed for slicing. This allows reading e.g. initial voltage without downloading the full time series.

Parameters¶

measurement_idstr: The ID of the measurement to load from the database.
optionsdict | None, optional: Options to pass to the DataLoader constructor.
use_cachebool, optional: If True (default), use local file cache to avoid repeated API calls. Set to False to force a fresh load from the database.
clientionworks.Ionworks | None, optional: Pre-configured Ionworks client. If not provided, a default Ionworks() client is created (using env vars).

Returns¶

DataLoader

classmethod from_local(data_path, options=None, use_polars=True)¶

Load data from local filesystem.

Parameters¶

data_pathstr: Path to the directory containing time_series.csv and optionally steps.csv files.
optionsdict | None, optional: Options to pass to the DataLoader constructor.
use_polarsbool, optional: If True (default), read CSV with Polars. If False, read with Pandas (data is still stored as Polars internally).

Returns¶

DataLoader

classmethod from_processed_data(data, steps, initial_voltage, start_idx, end_idx)¶

Create a DataLoader from already-processed data, bypassing __init__.

Parameters¶

datapd.DataFrame | pl.DataFrame: The processed time series data.
stepspd.DataFrame | pl.DataFrame | None: The processed steps data (or None).
initial_voltagefloat: The initial voltage value.
start_idxint: The start index for the data.
end_idxint: The end index for the data.

Returns¶

DataLoader

generate_experiment(use_cv: bool = False) → Experiment¶: Generate a PyBaMM experiment from the loaded step information.

generate_interpolant() → Interpolant¶: Generate a PyBaMM interpolant from the loaded step information.

property initial_voltage¶: Initial voltage (from first step or previous step end). Lazy-loads steps when from_db.

static interpolate_data(data: DataFrame, knots: float | ndarray, x_column: str = 'Time [s]') → DataFrame¶

Interpolate a Polars DataFrame using np.interp.

If knots is a float, data is resampled at that regular interval along x_column. If an array, interpolated at those knots. Used by the interpolate transform option.

Only numeric columns are interpolated. Non-numeric (e.g. Utf8/String) columns are skipped and do not appear in the returned DataFrame.

plot_data(show: bool = False) → tuple[Figure, Axes]¶: Plot voltage vs time data from the loaded experiment.

static remove_duplicate_ocp(data: DataFrame, capacity_column_name='Capacity [A.h]') → DataFrame¶

Remove duplicate capacity and voltage points.

Keeps first occurrence of each unique capacity and each unique voltage. Used by the remove_duplicates transform option.

static remove_ocp_extremes(data: DataFrame) → DataFrame¶

Remove OCP points at extremes where d²V/dQ² is zero.

Trims the capacity–voltage curve to the range where the second derivative of voltage with respect to capacity is non-zero. Used by the remove_extremes transform option.

set_processed_internal_state(*, transforms=None, measurement_id=None, capacity_column=None, first_step=None, last_step=None, original_time_series=None, original_steps=None)¶

Set internal state when constructing from processed data. Used by from_processed_data.

original_time_series and original_steps may be pandas or Polars DataFrames (or None); they are stored as Polars internally for config export.

slice_to_steps(first_step_idx: int, last_step_idx: int) → DataLoader¶

Create a new DataLoader containing only the given step range.

Only requires the steps table to be loaded — the time-series data remains lazy (not loaded) when the source DataLoader is lazy.

Parameters¶

first_step_idxint: Row index of the first step in the steps table (0-based).
last_step_idxint: Row index of the last step in the steps table (0-based, inclusive).

Returns¶

DataLoader

static sort_capacity_and_ocp(data: DataFrame) → DataFrame¶

Sort OCP data so voltage is decreasing and capacity is increasing.

Ensures a single capacity column (normalized to start at 0 and non-decreasing), removes duplicates, and reverses rows if voltage is increasing. Used by the sort transform option.

property start_idx: int¶

property steps: DataFrame | None¶: Step summary as a Polars DataFrame, or None. Use .steps.to_pandas() for pandas.

to_config(filter_data: bool = True) → dict¶

Convert the DataLoader back to parser configuration format.

Parameters¶

filter_databool, optional: If True (default) and steps are present, saves the filtered data rather than the original unfiltered data.

Returns¶

dict: Configuration dictionary that can recreate this DataLoader.

to_local() → DataLoader¶

Convert a DB-backed loader into a fully local one.

Ensures all lazy data (steps and time series) is fetched, then removes the measurement ID so that to_config() serialises the data inline instead of as a db: reference.

Returns self for chaining, e.g. loader.to_local().to_config().

transform_gitt_to_ocp()¶

Extract OCP from GITT rest steps: take the last data point of each rest.

Filters steps to those with Label == "GITT" and Step type == "Rest", computes cumulative net capacity (discharge/charge reset per step), then builds one OCP point (capacity, voltage) at the end of each such rest. If transforms["keep_first_ocp_point"] is True, prepends the first row of the first GITT step as an extra OCP point. Replaces data with the OCP table and clears steps.

transform_rest_to_ocp()¶

Extract OCP from all rest steps (no GITT label check).

Filters steps to those with Step type == "Rest" only. Useful when step type is available but GITT labels are missing or unreliable. Same cumulative-capacity and OCP-building logic as transform_gitt_to_ocp(). If transforms["keep_first_ocp_point"] is True, prepends the first row of the time series as an extra OCP point. Replaces data with the OCP table and clears steps.

Data Loading¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

This Page