Transforms

Short code snippets that transform your data

There are many occasions when Python code in a JupyterLab notebook may need to be run in the cloud. InfinStor offers the easiest way for cloud execution of Python code in JupyterLab.

What are InfinStor Transforms?

Code snippets that can transform your training data.

Transforms are snippets of python code that is captured from your JupyterLab notebook combined with a Conda or Docker environment for the code to run in. 

InfinStor Transforms are stored in the cloud and can be executed in a variety of locations such as your own JupyterLab’s kernel, a single virtual machine in the cloud, a cluster of machines in the cloud, and more. They can be used for purposes such as data capture, ML training, ML retraining, and ML inference.

Capture Transform

How do you capture an InfinStor Transform?

The first step in remotely executing python code from a cell in your JupyterLab notebook is to capture it, along with its execution environment, and save it in your InfinStor account as an InfinStor Transform.

The python code in your JupyterLab cell cannot read data from the local file system. This transform may execute in a variety of environments — in an IPython kernel in your JupyterLab server machine, in a single VM in the cloud, or in a cluster of machines in the cloud. This requires that the input data to this code should be stored somewhere in the cloud, preferably in a Cloud Object Store.

The slides below demonstrate a step-by-step process of capturing InfinStor Transforms on JupyterLab.

Run Transform Immediately

How do you run an InfinStor Transform immediately in the cloud?

There are four choices for input data in the dialog that pops up:

  • InfinSnap — Snapshot of the state of a bucket at a specific point in time. 
  • InfinSlice — Slice of data that was ingested between a start time and an end time. 
  • No Input Data — Useful for transforms that perform their own I/O.
  • MLflow Artifact — Artifacts from a previous MLflow run can be used as input data for this transform execution.
 

There are three choices for run location:

  • Inline — Inline in the Jupyter notebook
  • Single VM — In the cloud in a single virtual machine
  • EMR Cluster — Amazon EMR Cluster

The slides below demonstrate a step-by-step process of running InfinStor Transforms immediately in the cloud.

Run Transform Periodically

How do you run an InfinStor Transform periodically in the cloud?

Once a transform has been captured and its basic functionality tested, it can be run periodically in the Cloud.

The input data options available are similar to running immediately; however, they are slightly different for this case:

  • InfinSnap — Snapshot of the chosen bucket/path at the time the run triggers
  • InfinSlice — Percentage of slice of data from the time the run triggers to the previous trigger. Percent values allowed are 10%, 25%, 50%, 100% of data ingested in the interval
  • No Input Data — Useful for transforms that perform their own I/O.

The slides below demonstrate a step-by-step process of running InfinStor Transforms periodically in the cloud.