Transforms
Short code snippets that transform your data
There are many occasions when Python code in a JupyterLab notebook may need to be run in the cloud. InfinStor offers the easiest way for cloud execution of Python code in JupyterLab.
What are InfinStor Transforms?
Code snippets that can transform your training data.
Capture Transform
How do you capture an InfinStor Transform?
The first step in remotely executing python code from a cell in your JupyterLab notebook is to capture it, along with its execution environment, and save it in your InfinStor account as an InfinStor Transform.
The python code in your JupyterLab cell cannot read data from the local file system. This transform may execute in a variety of environments — in an IPython kernel in your JupyterLab server machine, in a single VM in the cloud, or in a cluster of machines in the cloud. This requires that the input data to this code should be stored somewhere in the cloud, preferably in a Cloud Object Store.
The slides below demonstrate a step-by-step process of capturing InfinStor Transforms on JupyterLab.




Run Transform Immediately
How do you run an InfinStor Transform immediately in the cloud?
There are four choices for input data in the dialog that pops up:
- InfinSnap — Snapshot of the state of a bucket at a specific point in time.
- InfinSlice — Slice of data that was ingested between a start time and an end time.
- No Input Data — Useful for transforms that perform their own I/O.
- MLflow Artifact — Artifacts from a previous MLflow run can be used as input data for this transform execution.
There are three choices for run location:
- Inline — Inline in the Jupyter notebook
- Single VM — In the cloud in a single virtual machine
- EMR Cluster — Amazon EMR Cluster







Run Transform Periodically
How do you run an InfinStor Transform periodically in the cloud?
Once a transform has been captured and its basic functionality tested, it can be run periodically in the Cloud.
The input data options available are similar to running immediately; however, they are slightly different for this case:
- InfinSnap — Snapshot of the chosen bucket/path at the time the run triggers
- InfinSlice — Percentage of slice of data from the time the run triggers to the previous trigger. Percent values allowed are 10%, 25%, 50%, 100% of data ingested in the interval
- No Input Data — Useful for transforms that perform their own I/O.





