Data Loading

A class for loading data for use in other Ionworks packages.

class ionworksdata.load.DataLoader(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)

Bases: object

Unified data loader for time-series and OCP data.

Handles two modes:

  • With steps: loads time-series data with step information for simulation, experiment generation, etc.

  • Without steps: loads simple tabular data (e.g. OCP curves with Capacity and Voltage columns).

Post-load preprocessing is configured via the transforms dict option.

The data and steps attributes return Polars DataFrames. Use loader.data.to_pandas() or loader.steps.to_pandas() for pandas. Constructors and property setters accept both pandas and Polars; inputs are converted to Polars internally.

Parameters

time_seriespd.DataFrame | pl.DataFrame | dict

The data to load. Can be a Pandas/Polars DataFrame or a dict.

stepspd.DataFrame | pl.DataFrame | dict | None, optional

Step information. When None, the loader operates in simple (no-steps) mode.

**kwargs

Options passed directly or via an options dict. Supported keys:

When steps are provided:

  • first_step, last_step : str | int

  • first_step_dict, last_step_dict : dict (deprecated)

Always:

__init__(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)
__getitem__(key: str) Series
generate_experiment(use_cv: bool = False) Experiment

Generate a PyBaMM experiment from the loaded step information.

plot_data(show: bool = False) tuple[Figure, Axes]

Plot voltage vs time data from the loaded experiment.

static filter_data(data: DataFrame, filters: dict) DataFrame

Filter a Polars DataFrame using the specified filter functions.

Each key in filters is a column name; the value must include filter_type (e.g. "savgol") and any parameters. Used by the filters transform option.

static interpolate_data(data: DataFrame, knots: float | ndarray, x_column: str = 'Time [s]') DataFrame

Interpolate a Polars DataFrame using np.interp.

If knots is a float, data is resampled at that regular interval along x_column. If an array, interpolated at those knots. Used by the interpolate transform option.

Only numeric columns are interpolated. Non-numeric (e.g. Utf8/String) columns are skipped and do not appear in the returned DataFrame.

classmethod from_local(data_path, options=None, use_polars=True)

Load data from local filesystem.

Parameters

data_pathstr

Path to the directory containing time_series.csv and optionally steps.csv files.

optionsdict | None, optional

Options to pass to the DataLoader constructor.

use_polarsbool, optional

If True (default), read CSV with Polars. If False, read with Pandas (data is still stored as Polars internally).

Returns

DataLoader

__getitem__(key: str) Series
__init__(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)
calculate_dQdU_cutoff(method: str = 'explicit', show_plot: bool = False, options: dict | None = None) float

Calculate the cut-off for dQdU based on the data.

Parameters

methodstr, optional

Method to use for calculating the cut-off. Options are: - “explicit” (default): Uses explicit method based on data range - “quantile”: Uses quantile-based method - “peaks”: Uses peak detection method

show_plotbool, optional

Whether to show a plot of the dQdU values with the cut-off.

optionsdict, optional

Dictionary of options to pass to the method.

Returns

float

Cut-off for dQdU

calculate_dUdQ_cutoff(method: str = 'explicit', show_plot: bool = False, options: dict | None = None) float

Calculate the cut-off for dUdQ based on the data.

Parameters

methodstr, optional

Method to use for calculating the cut-off. Options are: - “explicit” (default): Uses explicit method based on data range - “quantile”: Uses quantile-based method - “peaks”: Uses peak detection method

show_plotbool, optional

Whether to show a plot of the dUdQ values with the cut-off.

optionsdict, optional

Dictionary of options to pass to the method.

Returns

float

Cut-off for dUdQ

copy() DataLoader

Create a copy of the DataLoader instance.

property data: DataFrame

Time-series data as a Polars DataFrame. Use .data.to_pandas() for pandas.

property end_idx: int
static filter_data(data: DataFrame, filters: dict) DataFrame

Filter a Polars DataFrame using the specified filter functions.

Each key in filters is a column name; the value must include filter_type (e.g. "savgol") and any parameters. Used by the filters transform option.

classmethod from_db(measurement_id: str, options: dict | None = None, use_cache: bool = True, client=None) DataLoader

Load data from the Ionworks database (lazy loading).

Data is fetched on demand: accessing .initial_voltage or .steps loads only the steps table (small payload) via client.cell_measurement.steps(measurement_id). Accessing .data loads the time series as well via client.cell_measurement.time_series(measurement_id), after steps if needed for slicing. This allows reading e.g. initial voltage without downloading the full time series.

Parameters

measurement_idstr

The ID of the measurement to load from the database.

optionsdict | None, optional

Options to pass to the DataLoader constructor.

use_cachebool, optional

If True (default), use local file cache to avoid repeated API calls. Set to False to force a fresh load from the database.

clientionworks.Ionworks | None, optional

Pre-configured Ionworks client. If not provided, a default Ionworks() client is created (using env vars).

Returns

DataLoader

classmethod from_local(data_path, options=None, use_polars=True)

Load data from local filesystem.

Parameters

data_pathstr

Path to the directory containing time_series.csv and optionally steps.csv files.

optionsdict | None, optional

Options to pass to the DataLoader constructor.

use_polarsbool, optional

If True (default), read CSV with Polars. If False, read with Pandas (data is still stored as Polars internally).

Returns

DataLoader

classmethod from_processed_data(data, steps, initial_voltage, start_idx, end_idx)

Create a DataLoader from already-processed data, bypassing __init__.

Parameters

datapd.DataFrame | pl.DataFrame

The processed time series data.

stepspd.DataFrame | pl.DataFrame | None

The processed steps data (or None).

initial_voltagefloat

The initial voltage value.

start_idxint

The start index for the data.

end_idxint

The end index for the data.

Returns

DataLoader

generate_experiment(use_cv: bool = False) Experiment

Generate a PyBaMM experiment from the loaded step information.

generate_interpolant() Interpolant

Generate a PyBaMM interpolant from the loaded step information.

property initial_voltage

Initial voltage (from first step or previous step end). Lazy-loads steps when from_db.

static interpolate_data(data: DataFrame, knots: float | ndarray, x_column: str = 'Time [s]') DataFrame

Interpolate a Polars DataFrame using np.interp.

If knots is a float, data is resampled at that regular interval along x_column. If an array, interpolated at those knots. Used by the interpolate transform option.

Only numeric columns are interpolated. Non-numeric (e.g. Utf8/String) columns are skipped and do not appear in the returned DataFrame.

plot_data(show: bool = False) tuple[Figure, Axes]

Plot voltage vs time data from the loaded experiment.

static remove_duplicate_ocp(data: DataFrame, capacity_column_name='Capacity [A.h]') DataFrame

Remove duplicate capacity and voltage points.

Keeps first occurrence of each unique capacity and each unique voltage. Used by the remove_duplicates transform option.

static remove_ocp_extremes(data: DataFrame) DataFrame

Remove OCP points at extremes where d²V/dQ² is zero.

Trims the capacity–voltage curve to the range where the second derivative of voltage with respect to capacity is non-zero. Used by the remove_extremes transform option.

set_processed_internal_state(*, transforms=None, measurement_id=None, capacity_column=None, first_step=None, last_step=None, original_time_series=None, original_steps=None)

Set internal state when constructing from processed data. Used by from_processed_data.

original_time_series and original_steps may be pandas or Polars DataFrames (or None); they are stored as Polars internally for config export.

slice_to_steps(first_step_idx: int, last_step_idx: int) DataLoader

Create a new DataLoader containing only the given step range.

Only requires the steps table to be loaded — the time-series data remains lazy (not loaded) when the source DataLoader is lazy.

Parameters

first_step_idxint

Row index of the first step in the steps table (0-based).

last_step_idxint

Row index of the last step in the steps table (0-based, inclusive).

Returns

DataLoader

static sort_capacity_and_ocp(data: DataFrame) DataFrame

Sort OCP data so voltage is decreasing and capacity is increasing.

Ensures a single capacity column (normalized to start at 0 and non-decreasing), removes duplicates, and reverses rows if voltage is increasing. Used by the sort transform option.

property start_idx: int
property steps: DataFrame | None

Step summary as a Polars DataFrame, or None. Use .steps.to_pandas() for pandas.

to_config(filter_data: bool = True) dict

Convert the DataLoader back to parser configuration format.

Parameters

filter_databool, optional

If True (default) and steps are present, saves the filtered data rather than the original unfiltered data.

Returns

dict

Configuration dictionary that can recreate this DataLoader.

to_local() DataLoader

Convert a DB-backed loader into a fully local one.

Ensures all lazy data (steps and time series) is fetched, then removes the measurement ID so that to_config() serialises the data inline instead of as a db: reference.

Returns self for chaining, e.g. loader.to_local().to_config().

transform_gitt_to_ocp()

Extract OCP from GITT rest steps: take the last data point of each rest.

Filters steps to those with Label == "GITT" and Step type == "Rest", computes cumulative net capacity (discharge/charge reset per step), then builds one OCP point (capacity, voltage) at the end of each such rest. If transforms["keep_first_ocp_point"] is True, prepends the first row of the first GITT step as an extra OCP point. Replaces data with the OCP table and clears steps.

transform_rest_to_ocp()

Extract OCP from all rest steps (no GITT label check).

Filters steps to those with Step type == "Rest" only. Useful when step type is available but GITT labels are missing or unreliable. Same cumulative-capacity and OCP-building logic as transform_gitt_to_ocp(). If transforms["keep_first_ocp_point"] is True, prepends the first row of the time series as an extra OCP point. Replaces data with the OCP table and clears steps.