AI & ML Glossary

New to AI gig work? This glossary covers the key terms you will encounter across platforms, job listings, and industry discussions. From RLHF to tokenization, we break down the jargon so you can hit the ground running.

A

Adversarial Testing

A method of evaluating AI systems by deliberately trying to make them produce incorrect, harmful, or unintended outputs through carefully crafted inputs.

Alignment

The process of ensuring an AI system behaves in accordance with human values, intentions, and expectations. A core challenge in modern AI safety research.

Annotation

The process of adding labels, tags, or metadata to data (text, images, audio) so that machine learning models can learn from structured examples.

Attention Mechanism

A neural network component that allows a model to focus on the most relevant parts of an input when generating output. The foundation of modern transformer architectures.

B

Benchmark

A standardized test or dataset used to measure and compare the performance of different AI models on specific tasks like reasoning, coding, or language understanding.

Bias

Systematic errors in AI model outputs that reflect unfair prejudices in training data or model design, potentially leading to discriminatory or skewed results.

Bounding Box

A rectangular border drawn around an object in an image to identify its location. Commonly used in computer vision annotation tasks for object detection training.

C

Chain-of-Thought

A prompting technique that encourages AI models to show their reasoning step by step, leading to more accurate answers on complex problems like math or logic.

Classification

A machine learning task where the model assigns input data to one or more predefined categories, such as spam detection, sentiment analysis, or image recognition.

Context Window

The maximum amount of text (measured in tokens) that a language model can process at once. Larger context windows allow models to handle longer documents and conversations.

D

Data Annotation

The practice of labeling raw data (images, text, audio, video) with meaningful tags so machine learning models can learn patterns. One of the most common AI gig jobs.

Domain Expert

A professional with deep knowledge in a specific field (medicine, law, finance) who helps evaluate and improve AI outputs in their area of expertise.

E

Edge Case

An unusual or extreme input scenario that an AI model may handle poorly. Identifying and addressing edge cases is essential for building robust AI systems.

Embeddings

Dense numerical vector representations of data (words, sentences, images) that capture semantic meaning, allowing AI models to understand similarity and relationships.

Evaluation

The systematic process of measuring an AI model's performance using metrics, benchmarks, and human assessments to determine quality and identify areas for improvement.

Few-Shot Learning

A technique where an AI model is given a small number of examples in the prompt to guide its behavior on a specific task, without requiring full retraining.

Fine-Tuning

The process of further training a pre-trained AI model on a specific, smaller dataset to specialize it for a particular task or domain.

G

Golden Response

An ideal, expert-written answer used as a reference standard for evaluating AI model outputs. Often created by domain experts as part of RLHF training data.

Grounding

The technique of connecting AI model outputs to verifiable sources of information, reducing hallucinations and improving factual accuracy in generated responses.

Hallucination

When an AI model generates plausible-sounding but factually incorrect or fabricated information. A major challenge in large language model deployment.

Human-in-the-Loop

An AI system design where humans are involved in the training, evaluation, or decision-making process to ensure quality and catch errors that automated systems miss.

Inference

The process of running a trained AI model to generate predictions or outputs from new input data. Distinct from training, which is how the model learns.

Instruction Following

An AI model's ability to understand and accurately carry out specific user instructions, a key capability improved through RLHF and instruction tuning.

Labeling

The process of assigning descriptive tags or categories to data points (text, images, audio) for use in supervised machine learning training.

Large Language Model (LLM)

A neural network trained on massive text datasets that can understand and generate human-like text. Examples include GPT-4, Claude, and Gemini.

M

Model

A mathematical system trained on data to make predictions or generate outputs. In AI, models range from simple classifiers to complex language and vision systems.

O

Overfitting

When an AI model memorizes training data too closely and performs well on known examples but poorly on new, unseen data. A common challenge in model development.

P

Prompt

The input text or instruction given to an AI model to elicit a response. Effective prompt design significantly impacts the quality and relevance of model outputs.

Prompt Engineering

The practice of designing and refining input prompts to get optimal results from AI models. A growing professional skill and one of the most in-demand AI gig roles.

R

Ranking

The task of ordering multiple AI-generated responses from best to worst based on criteria like helpfulness, accuracy, and safety. A core component of RLHF training.

Reasoning

An AI model's ability to logically process information, draw conclusions, and solve problems step by step, rather than simply pattern matching from training data.

Red Teaming

A structured approach to testing AI systems by having humans or automated tools actively try to find flaws, biases, and safety vulnerabilities in model behavior.

Reinforcement Learning from Human Feedback (RLHF)

A training technique where AI models are improved using human evaluators who rate and rank model outputs, teaching the system to produce more helpful and aligned responses.

S

Safety

The field of ensuring AI systems do not produce harmful, dangerous, or unethical outputs. Encompasses alignment research, red teaming, content filtering, and responsible deployment.

Sentiment Analysis

A natural language processing task that identifies and categorizes opinions expressed in text as positive, negative, or neutral. Widely used in business analytics.

T

Temperature

A parameter that controls the randomness of AI model outputs. Lower temperatures produce more predictable, focused responses; higher temperatures yield more creative, diverse outputs.

Tokenization

The process of breaking text into smaller units called tokens (words, subwords, or characters) that AI models can process. Token count determines input limits and costs.

Training Data

The labeled or structured datasets used to teach AI models patterns and behaviors. Quality training data is essential for building accurate and reliable AI systems.

Transfer Learning

A technique where a model trained on one task is adapted for a different but related task, leveraging previously learned patterns to speed up training and improve performance.

Zero-Shot Learning

An AI model's ability to perform a task it was not explicitly trained on, using only the task description in the prompt without any examples.