ai-lab-transformers-playground/README.md

# 🧠 AI Lab – Transformers CLI Playground

> A **pedagogical and technical project** designed for AI practitioners and students to explore **Hugging Face Transformers** through an **interactive Command-Line Interface (CLI)** or a **REST API**.
> This playground provides ready-to-use NLP pipelines — including **Sentiment Analysis**, **Named Entity Recognition**, **Text Generation**, **Fill-Mask**, **Question Answering (QA)**, **Moderation**, and more — in a modular, extensible, and educational codebase.

---

<p align="center">
  <img src="https://img.shields.io/badge/Python-3.13-blue.svg" alt="Python"/>
  <img src="https://img.shields.io/badge/Built_with-Poetry-purple.svg" alt="Poetry"/>
  <img src="https://img.shields.io/badge/🤗-Transformers-orange.svg" alt="Transformers"/>
  <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License"/>
</p>

---

## 📑 Table of Contents

-   [📚 Overview](#-overview)
-   [🗂️ Project Structure](#️-project-structure)
-   [⚙️ Installation](#️-installation)
    -   [🧾 Option 1 – Poetry (Recommended)](#-option-1--poetry-recommended)
    -   [📦 Option 2 – Pip + Requirements](#-option-2--pip--requirements)
-   [▶️ Usage](#️-usage)
    -   [🖥️ CLI Mode](#️-cli-mode)
    -   [🌐 API Mode](#-api-mode)
-   [📡 API Endpoints](#-api-endpoints)
-   [🖥️ CLI Examples](#️-cli-examples)
-   [🧠 Architecture Overview](#-architecture-overview)
-   [⚙️ Configuration](#️-configuration)
-   [🧩 Extending the Playground](#-extending-the-playground)
-   [🧰 Troubleshooting](#-troubleshooting)
-   [🧭 Development Guidelines](#-development-guidelines)
-   [🧱 Roadmap](#-roadmap)
-   [📜 License](#-license)

---

## 📚 Overview

The **AI Lab – Transformers CLI Playground** enables users to explore **multiple NLP tasks directly from the terminal or via HTTP APIs**.
Each task (sentiment, NER, text generation, etc.) is implemented as a **Command Module** that communicates with a **Pipeline Module** powered by Hugging Face’s `transformers` library.

The project demonstrates **clean ML code architecture** with strict separation between:

-   Configuration
-   Pipelines
-   CLI logic
-   Display formatting

It’s a great educational resource for learning **how to structure ML applications** professionally.

---

## 🗂️ Project Structure

```text
src/
├── main.py                 # CLI entry point
│
├── cli/
│   ├── base.py             # CLICommand base class & interactive shell
│   └── display.py          # Console formatting utilities (colors, tables, results)
│
├── commands/               # User-facing commands wrapping pipeline logic
│   ├── sentiment.py        # Sentiment analysis command
│   ├── fillmask.py         # Masked token prediction
│   ├── textgen.py          # Text generation
│   ├── ner.py              # Named Entity Recognition
│   ├── qa.py               # Question Answering (extractive)
│   └── moderation.py       # Content moderation / toxicity detection
│
├── pipelines/              # ML logic based on Hugging Face pipelines
│   ├── template.py         # Blueprint for creating new pipelines
│   ├── sentiment.py
│   ├── fillmask.py
│   ├── textgen.py
│   ├── ner.py
│   ├── qa.py
│   └── moderation.py
│
├── api/
│   ├── app.py              # FastAPI app and endpoints
│   ├── models.py           # Pydantic schemas
│   └── config.py           # API configuration
│
└── config/
    └── settings.py         # Global configuration (models, params)
```

---

## ⚙️ Installation

### 🧾 Option 1 – Poetry (Recommended)

> Poetry is the main dependency manager for this project.

```bash
poetry shell
poetry install
```

This installs all dependencies defined in `pyproject.toml` (including `transformers`, `torch`, and `fastapi`).

Run the app:

```bash
# CLI mode
poetry run python src/main.py --mode cli

# API mode
poetry run python src/main.py --mode api
```

---

### 📦 Option 2 – Pip + requirements.txt

If you prefer manual dependency management:

```bash
python -m venv .venv
source .venv/bin/activate      # Linux/macOS
.venv\Scripts\Activate.ps1     # Windows

pip install -r requirements.txt
```

---

## ▶️ Usage

### 🖥️ CLI Mode

Run the interactive CLI:

```bash
python -m src.main --mode cli
```

Interactive menu:

```
Welcome to AI Lab - Transformers CLI Playground
Available commands:
  • sentiment     – Analyze the sentiment of a text
  • fillmask      – Predict masked words in a sentence
  • textgen       – Generate text from a prompt
  • ner           – Extract named entities from text
  • qa            – Answer questions from a context
  • moderation    – Detect toxic or unsafe content
```

---

### 🌐 API Mode

Run FastAPI server:

```bash
python -m src.main --mode api
# Custom config
python -m src.main --mode api --host 0.0.0.0 --port 8000 --reload
```

API Docs:

-   **Swagger** → http://localhost:8000/docs
-   **ReDoc** → http://localhost:8000/redoc
-   **OpenAPI** → http://localhost:8000/openapi.json

---

## 📡 API Endpoints

### Core Endpoints

| Method | Endpoint  | Description               |
| ------ | --------- | ------------------------- |
| `GET`  | `/`       | Health check and API info |
| `GET`  | `/health` | Detailed health status    |

### Individual Processing

| Method | Endpoint      | Description            |
| ------ | ------------- | ---------------------- |
| `POST` | `/sentiment`  | Analyze text sentiment |
| `POST` | `/fillmask`   | Predict masked words   |
| `POST` | `/textgen`    | Generate text          |
| `POST` | `/ner`        | Extract named entities |
| `POST` | `/qa`         | Question answering     |
| `POST` | `/moderation` | Content moderation     |

### Batch Processing

| Method | Endpoint            | Description                |
| ------ | ------------------- | -------------------------- |
| `POST` | `/sentiment/batch`  | Process multiple texts     |
| `POST` | `/fillmask/batch`   | Fill multiple masked texts |
| `POST` | `/textgen/batch`    | Generate from prompts      |
| `POST` | `/ner/batch`        | Extract entities in batch  |
| `POST` | `/qa/batch`         | Answer questions in batch  |
| `POST` | `/moderation/batch` | Moderate multiple texts    |

---

## 🖥️ CLI Examples

### 🔹 Sentiment Analysis

```text
💬 Enter text: I absolutely love this project!
→ Sentiment: POSITIVE (score: 0.998)
```

### 🔹 Fill-Mask

```text
💬 Enter text: The capital of France is [MASK].
→ Predictions:
  1) Paris      score: 0.87
  2) Lyon       score: 0.04
```

### 🔹 Text Generation

```text
💬 Prompt: Once upon a time
→ Output: Once upon a time there was a young AI learning to code...
```

### 🔹 NER

```text
💬 Enter text: Elon Musk founded SpaceX in California.
→ Entities:
  - Elon Musk  (PERSON)
  - SpaceX     (ORG)
  - California (LOC)
```

### 🔹 QA (Question Answering)

```text
💬 Enter question: What is the capital of France?
💬 Enter context: France is a country in Europe. Its capital is Paris.
→ Answer: The capital of France is Paris.
```

### 🔹 Moderation

```text
💬 Enter text: I hate everything!
→ Result: FLAGGED (toxic content detected)
```

---

## 🧠 Architecture Overview

Both CLI and API share the **same pipeline layer**, ensuring code reusability and consistency.

### CLI Architecture

```text
InteractiveCLI → Command Layer → Pipeline Layer → Display Layer
```

### API Architecture

```text
FastAPI App → Pydantic Models → Pipeline Layer → JSON Response
```

| Layer        | Description                                    |
| ------------ | ---------------------------------------------- |
| **CLI**      | Manages user input/output and navigation.      |
| **API**      | Exposes endpoints with automatic OpenAPI docs. |
| **Command**  | Encapsulates user-facing operations.           |
| **Pipeline** | Wraps Hugging Face’s pipelines.                |
| **Models**   | Validates requests/responses.                  |
| **Display**  | Formats console output.                        |

---

## ⚙️ Configuration

All configuration is centralized in `src/config/settings.py`:

```python
class Config:
    DEFAULT_MODELS = {
        "sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
        "fillmask":  "bert-base-uncased",
        "textgen":   "gpt2",
        "ner":       "dslim/bert-base-NER",
        "qa":        "distilbert-base-cased-distilled-squad",
        "moderation":"unitary/toxic-bert",
    }
    MAX_LENGTH = 512
    BATCH_SIZE = 8
```

---

## 🧩 Extending the Playground

To add a new NLP experiment (e.g., keyword extraction):

1. Duplicate `src/pipelines/template.py` → `src/pipelines/keywords.py`
2. Create a command: `src/commands/keywords.py`
3. Register it in `src/main.py`
4. Add Pydantic models and API endpoint
5. Update `Config.DEFAULT_MODELS`

Both CLI and API will automatically share this logic.

---

## 🧰 Troubleshooting

| Issue                    | Solution                |
| ------------------------ | ----------------------- |
| `transformers` not found | Activate your venv.     |
| Torch install fails      | Use CPU-only wheel.     |
| Models download slowly   | Cached after first use. |
| Encoding issues          | Ensure UTF-8 terminal.  |

### API Issues

| Issue                | Solution                                |
| -------------------- | --------------------------------------- |
| `FastAPI` missing    | `pip install fastapi uvicorn[standard]` |
| Port in use          | Change with `--port 8001`               |
| CORS error           | Edit `allow_origins` in `api/config.py` |
| Validation error 422 | Check request body                      |
| 500 error            | Verify model loading                    |

---

## 🧭 Development Guidelines

-   Keep command classes lightweight (no ML inside)
-   Use the pipeline template for new tasks
-   Format all outputs via `DisplayFormatter`
-   Document new commands and models

---

## 🧱 Roadmap

-   [ ] Non-interactive CLI flags (`--text`, `--task`)
-   [ ] Multilingual models
-   [ ] Test coverage
-   [ ] Logging & profiling
-   [ ] Export to JSON/CSV

---

## 📜 License

Licensed under the [MIT License](./LICENSE).
You are free to use, modify, and distribute this project.

---

✨ **End of Documentation**
_The AI Lab – Transformers CLI Playground: built for learning, experimenting, and sharing NLP excellence._