Uploading Data To PISTIS

General overview

Data upload to a PISTIS Factory is handled through the Data Check‑in component. This component is mainly managed via the Job Configurator, except for the stream upload. The Job Configurator panel allows users to create automated jobs that perform predefined data management actions. The Data Check‑in component supports multiple data ingestion methods:

  • Batch File Upload: Upload data directly from files.
  • API Upload: Ingest data through external services using APIs.
  • FTP Upload: Retrieve data from FTP servers.

The API upload and the FTP file upload functionalities support one‑time or periodical uploads, through a scheduler that instructs the platform to fetch data in selected intervals, depending on the user’s requirements.

In addition to these options, PISTIS also supports Streaming Upload, which enables continuous ingestion of data from real‑time data streams. Unlike the other upload methods, Streaming Upload is accessed through a dedicated menu and operates independently of the Job Configurator.

Creating Data Pipelines with the Job Configurator

The Job Configurator allows users to design complete data pipelines by combining the first three upload methods (Batch File, API, and FTP) with additional steps such as data transformation and insight generation. Pipelines are created using a drag‑and‑drop interface:

  1. Select available services from the "Services Available" panel.
  2. Drag and drop them into the "Workflow Representation" area to define the execution flow.
  3. Once the pipeline is configured, click the Run Workflow button located at the bottom of the page.

After execution starts, the system returns a workflow ID.

Note: Workflow execution may take several seconds. The workflow ID can be used to monitor progress and check execution status from the Workflow Execution menu.

More detailed explanations of each upload option and configuration step are provided in the following sections.

Uploading a Dataset from a file

The user is able to upload a dataset to Factory Data Catalogue by dragging & dropping the "Data Check-in:uploadFile" building block from the left ("Services Available" panel) to the righ-hand side of the window ("Workflow Representation" panel). The user must fill-in the required dataset info: dataset to upload, dataset name, description, category and keywords. The user may also decide if the dataset should be uploaded encrypted or not.

Upload Dataset

As mentioned in the general procedure, once submitted the system returns a workflow ID. The upload may take a few minutes, depending on the size and nature of the dataset. The user may wait or consult the status of the job using the workflow ID in the "Workflow Execution" menu option.

Upload Dataset

Uploading a Dataset from an API

Using the previously explained drag&drop functionality, the user is able to upload data coming from web APIs (e.g., REST service). This is dome by selecting the "Data Check-in:uploadDataFromAPI" and filling in the appropriate information to access the data. The information required is shown in the following screenshot:

Upload Dataset

Note that the upload from API can run once or periodically. If the user select a periodicity (hourly, daily or monthly) and the specific time to start the job.

Uploading a Dataset from FTP

As in the previous case, the user may upload data coming from FTP. This is done by selecting the "Data Check-in:uploadDataFromFTPServer" and filling in the appropriate information to access the data. The information required is shown in the following screenshot:

Upload Dataset

As in the case of upload from API, the FTP upload may run once or periodically. If the user select a periodicity (hourly, daily or monthly) and the specific time to start the job.

Performing Transformations over the Data

Using the job configurator panel, the user is able to perform various transformations over the data based on pre-defined transformations.

The system provides a Data Transformation Designer panel as a playground to design and test transformations over the dataset locally. The Transformation Designer provides written information in the page about its usage. If the user plans to run some data transformation techniques, before uploading the file users should define first the transformation in the Transformation Designer.

Upload Dataset

Once the transformation is tested in the Transformation Designer, the user should copy the rules to the Job configuration panel. To do so, drag & drop both the Data Check-in and the Data Transformation blocks to the right, select the file to upload and copy the transformation rules into the Transformation box:

Upload Dataset

Generating insights over the Data

The user may want to inspect the dataset by generating insights. In order to do that, they may use directly the "Insight Generator" menu option or create a pipeline with the insights generator in the Job Configurator option, combining the "Data Check-in:uploadFile" with the "generateInsights" building blocks (note that this option is not available for encrypted datasets). Either way, the system generates an insights report. The following screenshot shows the Insights Generator at work within the Job Configurator panel:

Upload Dataset

Once submitted, the job will upload the file and a set of default insights.

Dataset available in Factory Catalogue

After the completion of the jobs defined above, the ingested dataset is featured in the Data Catalogue of the Factory infrastructure of the user.