DATA CLOUD GLOSSARY

A

A method used to efficiently and accurately compute the gradient of a function with respect to its inputs.
Read More

C

Computational cost refers to the computing resources required to complete a specific task. These resources can be memory, computation time, bandwidth, or
Read More

Confidential Computing offers a hardware-based security solution designed to protect data during use with application-isolation technology.
Read More

D

Data access is the ability for users to access their data given physical, software, or legal and policy-driven constraints.
Read More

Data anonymization protects private or sensitive information by erasing or encrypting identifiers that connect an individual to the data.
Read More

Data governance is the overall management of the availability, usability, integrity, and security of the data used in an organization
Read More

Data labeling identifies objects on raw data such as images, text, videos, and audio. The goal is to provide one or more informative labels to provide context so that a machine learning model can learn from it.
Read More

Data masking obscures or replaces sensitive information in a dataset to minimize exposure while maintaining the data’s functional value.
Read More

Data ownership refers to the rights and responsibilities of individuals or organizations in the collection, storage, use, and distribution of data.
Read More

Data perturbation changes an original dataset by applying techniques that round numbers and add random noise.
Read More

Data control refers to the measures and processes put in place to manage the access, use, and dissemination of data within an organization.
Read More

Data redaction refers to removing certain pieces of information from data, designed to keep that data from being linked to individuals or used for wrongdoing.
Read More

Data sharing is making data available to other individuals or organizations. It involves data exchange between individuals, groups, or organizations,
Read More

Data Tokenization is a process by which sensitive data is replaced by non-sensitive characters known as a token.
Read More

Data-type agnostic is the property of a system, process, or algorithm that can handle and process different types of data without any bias towards any particular type of data.
Read More

In data engineering and machine learning, a data pipeline refers to the steps involved in extracting, transforming, loading, and processing data.
Read More

Deployment data refers to the data used to deploy and run a machine-learning model in a production environment.
Read More

Differential privacy is a system for sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals
Read More

A technique in machine learning and artificial intelligence that involves training models using gradient-based optimization.
Read More

E

An edge system is a computing system that is located at the “edge” of a network, close to the source of the data.
Read More

Encryption is the process of converting plaintext data into a coded form of data that can only be unlocked by someone who has the appropriate key.
Read More

F

Federated Learning is a machine learning technique that enables training on a decentralized dataset distributed.
Read More

G

Gradient-based optimization is a method used in machine learning and artificial intelligence to update the parameters of a model to minimize a loss function.
Read More

H

A form of encryption that allows computations to be performed directly on ciphertext without the need first to decrypt the data.
Read More

I

Inference data is used to make predictions or inferences with a trained machine learning (ML) model.
Read More

J

Jupyter Notebook is an open-source interactive computing platform that allows users to create and share documents that contain live code, equations, visualizations, and narrative text.
Read More

L

Labeled data refers to data that has already been annotated or categorized with labels or tags that describe the content of the data.
Read More

M

A machine learning model is a mathematical representation of a system capable of learning from data and making predictions or decisions.
Read More

Model drift, also known as concept drift, refers to a phenomenon in machine learning where the distribution of the data changes over time, and the trained model’s performance degrades.
Read More

MLOps, an abbreviation for Machine Learning Operations, is a set of practices and processes for managing the end-to-end lifecycle of machine learning models.
Read More

The machine learning (ML) life cycle refers to the stages in building, deploying, and maintaining a machine learning model.
Read More

Model deployment refers to making a trained ML model accessible and usable in a real-world production environment by integrating it into a production system and monitoring its performance.
Read More

N

Natural Language Processing (NLP) concerns interactions between computers and human (natural) languages. The goal is to enable computers to process, understand, and generate human language.
Read More

Neural Network is a wide term in the field of AI that refers to any type of network that is trained to process data
Read More

O

The optimization step is the iteration process of finding the best set of parameters or weights for a machine learning model to predict the outputs based on the inputs accurately.
Read More

Open banking is a financial services model that allows third-party providers, such as fintech companies, to access bank customers’ financial data with their consent.
Read More

P

Personally Identifiable Information, abbreviated as PII, refers to any information that can be used to identify a specific individual, such as name, address, driver’s license number, etc.
Read More

PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
Read More

R

Responsible AI is a set of practices and principles designed to ensure and promote the safe and ethical use of AI,
Read More

S

Sensitive data is confidential, private, or protected by regulations. This may include personal information, financial information,
Read More

Structured data refers to data that is organized in a tabular format with well-defined columns and rows.
Read More

Synthetic data is artificially generated data that is used to mimic real-world data. Synthetic data is often used for testing and training machine learning models
Read More

Semi-structured data does not follow any data model because it does not have a fixed schema. Unlike structured/tabular data, it lacks any rigid form
Read More

T

Tabular data refers to data organized in a table format with rows and columns. Each row represents an instance or an observation, and each column represents a feature or an attribute of the cases.
Read More

TensorFlow is an open-source software library for dataflow and differentiable programming across various tasks.
Read More

Training data is the initial data feed into the system to train the ML algorithm. People (workforce), Process (business rules,
Read More

The training model is a part of the data science lifecycle wherein datasets are used to train machine learning algorithms.
Read More

A training model refers to a quantitative representation of a problem that is used to learn patterns and relationships in training data.
Read More

Task-agnostic refers to algorithms or models that can be applied to various jobs, regardless of the specific task being performed.
Read More

A trust boundary refers to a clear distinction between the parts of a system that are trusted to behave correctly and securely and those that are not.
Read More

U

Unstructured data refers to data that cannot be easily processed and analyzed using traditional machine learning algorithms due to its lack of a predefined format.
Read More

Data not tagged with labels identifying characteristics, properties, or classifications’.
Read More