Data Loading¶
A class for loading data for use in other Ionworks packages.
- class ionworksdata.load.DataLoader(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)¶
Bases:
objectUnified data loader for time-series and OCP data.
Handles two modes:
With steps: loads time-series data with step information for simulation, experiment generation, etc.
Without steps: loads simple tabular data (e.g. OCP curves with Capacity and Voltage columns).
Post-load preprocessing is configured via the
transformsdict option.The
dataandstepsattributes return Polars DataFrames. Useloader.data.to_pandas()orloader.steps.to_pandas()for pandas. Constructors and property setters accept both pandas and Polars; inputs are converted to Polars internally.Parameters¶
- time_seriespd.DataFrame | pl.DataFrame | dict
The data to load. Can be a Pandas/Polars DataFrame or a dict.
- stepspd.DataFrame | pl.DataFrame | dict | None, optional
Step information. When None, the loader operates in simple (no-steps) mode.
**kwargsOptions passed directly or via an
optionsdict. Supported keys:When steps are provided:
first_step, last_step : str | int
first_step_dict, last_step_dict : dict (deprecated)
Always:
- capacity_columnstr
Name of the column in
time_seriesto use as the capacity axis (copied to"Capacity [A.h]").
- transformsdict with any of:
- gitt_to_ocpbool
See
transform_gitt_to_ocp()for details.
- rest_to_ocpbool
See
transform_rest_to_ocp()for details.
- sortbool
See
sort_capacity_and_ocp()for details.
- remove_duplicatesbool
See
remove_duplicate_ocp()for details.
- remove_extremesbool
See
remove_ocp_extremes()for details.
- filtersdict
See
filter_data()for details.
- interpolatefloat | np.ndarray
See
interpolate_data()for details.
- keep_first_ocp_pointbool
If True, prepend the first point (see
transform_gitt_to_ocp()andtransform_rest_to_ocp()). Default False. Ignored if gitt_to_ocp and rest_to_ocp are both False.
- __init__(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)
- __getitem__(key: str) Series
- generate_experiment(use_cv: bool = False) Experiment
Generate a PyBaMM experiment from the loaded step information.
- plot_data(show: bool = False) tuple[Figure, Axes]
Plot voltage vs time data from the loaded experiment.
- static filter_data(data: DataFrame, filters: dict) DataFrame
Filter a Polars DataFrame using the specified filter functions.
Each key in
filtersis a column name; the value must includefilter_type(e.g."savgol") and any parameters. Used by thefilterstransform option.
- static interpolate_data(data: DataFrame, knots: float | ndarray, x_column: str = 'Time [s]') DataFrame
Interpolate a Polars DataFrame using np.interp.
If
knotsis a float, data is resampled at that regular interval alongx_column. If an array, interpolated at those knots. Used by theinterpolatetransform option.Only numeric columns are interpolated. Non-numeric (e.g. Utf8/String) columns are skipped and do not appear in the returned DataFrame.
- classmethod from_local(data_path, options=None, use_polars=True)
Load data from local filesystem.
Parameters¶
- data_pathstr
Path to the directory containing time_series.csv and optionally steps.csv files.
- optionsdict | None, optional
Options to pass to the DataLoader constructor.
- use_polarsbool, optional
If True (default), read CSV with Polars. If False, read with Pandas (data is still stored as Polars internally).
Returns¶
DataLoader
- __getitem__(key: str) Series¶
- __init__(time_series: DataFrame | DataFrame | dict, steps: DataFrame | DataFrame | dict | None = None, **kwargs)¶
- calculate_dQdU_cutoff(method: str = 'explicit', show_plot: bool = False, options: dict | None = None) float¶
Calculate the cut-off for dQdU based on the data.
Parameters¶
- methodstr, optional
Method to use for calculating the cut-off. Options are: - “explicit” (default): Uses explicit method based on data range - “quantile”: Uses quantile-based method - “peaks”: Uses peak detection method
- show_plotbool, optional
Whether to show a plot of the dQdU values with the cut-off.
- optionsdict, optional
Dictionary of options to pass to the method.
Returns¶
- float
Cut-off for dQdU
- calculate_dUdQ_cutoff(method: str = 'explicit', show_plot: bool = False, options: dict | None = None) float¶
Calculate the cut-off for dUdQ based on the data.
Parameters¶
- methodstr, optional
Method to use for calculating the cut-off. Options are: - “explicit” (default): Uses explicit method based on data range - “quantile”: Uses quantile-based method - “peaks”: Uses peak detection method
- show_plotbool, optional
Whether to show a plot of the dUdQ values with the cut-off.
- optionsdict, optional
Dictionary of options to pass to the method.
Returns¶
- float
Cut-off for dUdQ
- copy() DataLoader¶
Create a copy of the DataLoader instance.
- property data: DataFrame¶
Time-series data as a Polars DataFrame. Use .data.to_pandas() for pandas.
- property end_idx: int¶
- static filter_data(data: DataFrame, filters: dict) DataFrame¶
Filter a Polars DataFrame using the specified filter functions.
Each key in
filtersis a column name; the value must includefilter_type(e.g."savgol") and any parameters. Used by thefilterstransform option.
- classmethod from_db(measurement_id: str, options: dict | None = None, use_cache: bool = True, client=None) DataLoader¶
Load data from the Ionworks database (lazy loading).
Data is fetched on demand: accessing
.initial_voltageor.stepsloads only the steps table (small payload) viaclient.cell_measurement.steps(measurement_id). Accessing.dataloads the time series as well viaclient.cell_measurement.time_series(measurement_id), after steps if needed for slicing. This allows reading e.g. initial voltage without downloading the full time series.Parameters¶
- measurement_idstr
The ID of the measurement to load from the database.
- optionsdict | None, optional
Options to pass to the DataLoader constructor.
- use_cachebool, optional
If True (default), use local file cache to avoid repeated API calls. Set to False to force a fresh load from the database.
- clientionworks.Ionworks | None, optional
Pre-configured Ionworks client. If not provided, a default
Ionworks()client is created (using env vars).
Returns¶
DataLoader
- classmethod from_local(data_path, options=None, use_polars=True)¶
Load data from local filesystem.
Parameters¶
- data_pathstr
Path to the directory containing time_series.csv and optionally steps.csv files.
- optionsdict | None, optional
Options to pass to the DataLoader constructor.
- use_polarsbool, optional
If True (default), read CSV with Polars. If False, read with Pandas (data is still stored as Polars internally).
Returns¶
DataLoader
- classmethod from_processed_data(data, steps, initial_voltage, start_idx, end_idx)¶
Create a DataLoader from already-processed data, bypassing __init__.
Parameters¶
- datapd.DataFrame | pl.DataFrame
The processed time series data.
- stepspd.DataFrame | pl.DataFrame | None
The processed steps data (or None).
- initial_voltagefloat
The initial voltage value.
- start_idxint
The start index for the data.
- end_idxint
The end index for the data.
Returns¶
DataLoader
- generate_experiment(use_cv: bool = False) Experiment¶
Generate a PyBaMM experiment from the loaded step information.
- generate_interpolant() Interpolant¶
Generate a PyBaMM interpolant from the loaded step information.
- property initial_voltage¶
Initial voltage (from first step or previous step end). Lazy-loads steps when from_db.
- static interpolate_data(data: DataFrame, knots: float | ndarray, x_column: str = 'Time [s]') DataFrame¶
Interpolate a Polars DataFrame using np.interp.
If
knotsis a float, data is resampled at that regular interval alongx_column. If an array, interpolated at those knots. Used by theinterpolatetransform option.Only numeric columns are interpolated. Non-numeric (e.g. Utf8/String) columns are skipped and do not appear in the returned DataFrame.
- plot_data(show: bool = False) tuple[Figure, Axes]¶
Plot voltage vs time data from the loaded experiment.
- static remove_duplicate_ocp(data: DataFrame, capacity_column_name='Capacity [A.h]') DataFrame¶
Remove duplicate capacity and voltage points.
Keeps first occurrence of each unique capacity and each unique voltage. Used by the
remove_duplicatestransform option.
- static remove_ocp_extremes(data: DataFrame) DataFrame¶
Remove OCP points at extremes where d²V/dQ² is zero.
Trims the capacity–voltage curve to the range where the second derivative of voltage with respect to capacity is non-zero. Used by the
remove_extremestransform option.
- set_processed_internal_state(*, transforms=None, measurement_id=None, capacity_column=None, first_step=None, last_step=None, original_time_series=None, original_steps=None)¶
Set internal state when constructing from processed data. Used by from_processed_data.
original_time_series and original_steps may be pandas or Polars DataFrames (or None); they are stored as Polars internally for config export.
- slice_to_steps(first_step_idx: int, last_step_idx: int) DataLoader¶
Create a new DataLoader containing only the given step range.
Only requires the steps table to be loaded — the time-series data remains lazy (not loaded) when the source DataLoader is lazy.
Parameters¶
- first_step_idxint
Row index of the first step in the steps table (0-based).
- last_step_idxint
Row index of the last step in the steps table (0-based, inclusive).
Returns¶
DataLoader
- static sort_capacity_and_ocp(data: DataFrame) DataFrame¶
Sort OCP data so voltage is decreasing and capacity is increasing.
Ensures a single capacity column (normalized to start at 0 and non-decreasing), removes duplicates, and reverses rows if voltage is increasing. Used by the
sorttransform option.
- property start_idx: int¶
- property steps: DataFrame | None¶
Step summary as a Polars DataFrame, or None. Use .steps.to_pandas() for pandas.
- to_config(filter_data: bool = True) dict¶
Convert the DataLoader back to parser configuration format.
Parameters¶
- filter_databool, optional
If True (default) and steps are present, saves the filtered data rather than the original unfiltered data.
Returns¶
- dict
Configuration dictionary that can recreate this DataLoader.
- to_local() DataLoader¶
Convert a DB-backed loader into a fully local one.
Ensures all lazy data (steps and time series) is fetched, then removes the measurement ID so that
to_config()serialises the data inline instead of as adb:reference.Returns
selffor chaining, e.g.loader.to_local().to_config().
- transform_gitt_to_ocp()¶
Extract OCP from GITT rest steps: take the last data point of each rest.
Filters steps to those with
Label == "GITT"andStep type == "Rest", computes cumulative net capacity (discharge/charge reset per step), then builds one OCP point (capacity, voltage) at the end of each such rest. Iftransforms["keep_first_ocp_point"]is True, prepends the first row of the first GITT step as an extra OCP point. Replacesdatawith the OCP table and clearssteps.
- transform_rest_to_ocp()¶
Extract OCP from all rest steps (no GITT label check).
Filters steps to those with
Step type == "Rest"only. Useful when step type is available but GITT labels are missing or unreliable. Same cumulative-capacity and OCP-building logic astransform_gitt_to_ocp(). Iftransforms["keep_first_ocp_point"]is True, prepends the first row of the time series as an extra OCP point. Replacesdatawith the OCP table and clearssteps.