ai-lab-transformers-playground/README.md

# 🧠 AI Lab – Transformers CLI Playground

> A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive Command‑Line Interface (CLI)**.
> This playground provides ready‑to‑use NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, Fill‑Mask, Moderation, etc.) in a modular, extensible, and educational codebase.

---

## 📚 Overview

The **AI Lab – Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.
Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.

The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.

---

## 🗂️ Project Structure

```text
src/
├── __init__.py
├── main.py                 # CLI entry point
│
├── cli/
│   ├── __init__.py
│   ├── base.py             # CLICommand base class & interactive shell handler
│   └── display.py          # Console formatting utilities (tables, colors, results)
│
├── commands/               # User-facing commands wrapping pipeline logic
│   ├── __init__.py
│   ├── sentiment.py        # Sentiment analysis command
│   ├── fillmask.py         # Masked token prediction command
│   ├── textgen.py          # Text generation command
│   ├── ner.py              # Named Entity Recognition command
│   └── moderation.py       # Toxicity / content moderation command
│
├── pipelines/              # Machine learning logic (Hugging Face Transformers)
│   ├── __init__.py
│   ├── template.py         # Blueprint for creating new pipelines
│   ├── sentiment.py
│   ├── fillmask.py
│   ├── textgen.py
│   ├── ner.py
│   └── moderation.py
│
├── api/
│   ├── __init__.py
│   ├── app.py              # FastAPI application with all endpoints
│   ├── models.py           # Pydantic request/response models
│   └── config.py           # API-specific configuration
│
└── config/
    ├── __init__.py
    └── settings.py         # Global configuration (default models, parameters)
```

---

## ⚙️ Installation

### 🧾 Option 1 – Using Poetry (Recommended)

> Poetry is used as the main dependency manager.

```bash
# 1. Create and activate a new virtual environment
poetry shell

# 2. Install dependencies
poetry install
```

This will automatically install all dependencies declared in `pyproject.toml`, including **transformers**, **torch**, and **FastAPI** for the API mode.

To run the application inside the Poetry environment:

```bash
# CLI mode
poetry run python src/main.py --mode cli

# API mode
poetry run python src/main.py --mode api
```

---

### 📦 Option 2 – Using pip and requirements.txt

If you prefer using `requirements.txt` manually:

```bash
# 1. Create a virtual environment
python -m venv .venv

# 2. Activate it
# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.venv\Scripts\Activate.ps1

# 3. Install dependencies
pip install -r requirements.txt
```

---

## ▶️ Usage

The application supports two modes: **CLI** (interactive) and **API** (REST server).

### 🖥️ CLI Mode

Launch the interactive CLI with:

```bash
python -m src.main --mode cli
# or, if using Poetry
poetry run python src/main.py --mode cli
```

You'll see an interactive menu listing the available commands:

```
Welcome to AI Lab - Transformers CLI Playground
Available commands:
  • sentiment     – Analyze the sentiment of a text
  • fillmask      – Predict masked words in a sentence
  • textgen       – Generate text from a prompt
  • ner           – Extract named entities from text
  • moderation    – Detect toxic or unsafe content
```

### 🌐 API Mode

Launch the FastAPI server with:

```bash
python -m src.main --mode api
# or with custom settings
python -m src.main --mode api --host 0.0.0.0 --port 8000 --reload
```

The API will be available at:

-   **Swagger Documentation**: http://localhost:8000/docs
-   **ReDoc Documentation**: http://localhost:8000/redoc
-   **OpenAPI Schema**: http://localhost:8000/openapi.json

## 📡 API Endpoints

The REST API provides all CLI functionality through HTTP endpoints:

### Core Endpoints

| Method | Endpoint  | Description                      |
| ------ | --------- | -------------------------------- |
| `GET`  | `/`       | Health check and API information |
| `GET`  | `/health` | Detailed health status           |

### Individual Processing

| Method | Endpoint      | Description              | Input                                                              |
| ------ | ------------- | ------------------------ | ------------------------------------------------------------------ |
| `POST` | `/sentiment`  | Analyze text sentiment   | `{"text": "string", "model": "optional"}`                          |
| `POST` | `/fillmask`   | Fill masked words        | `{"text": "Hello [MASK]", "model": "optional"}`                    |
| `POST` | `/textgen`    | Generate text            | `{"text": "prompt", "model": "optional"}`                          |
| `POST` | `/ner`        | Named entity recognition | `{"text": "string", "model": "optional"}`                          |
| `POST` | `/qa`         | Question answering       | `{"question": "string", "context": "string", "model": "optional"}` |
| `POST` | `/moderation` | Content moderation       | `{"text": "string", "model": "optional"}`                          |

### Batch Processing

| Method | Endpoint            | Description                          | Input                                                |
| ------ | ------------------- | ------------------------------------ | ---------------------------------------------------- |
| `POST` | `/sentiment/batch`  | Process multiple texts               | `{"texts": ["text1", "text2"], "model": "optional"}` |
| `POST` | `/fillmask/batch`   | Fill multiple masked texts           | `{"texts": ["text1 [MASK]"], "model": "optional"}`   |
| `POST` | `/textgen/batch`    | Generate from multiple prompts       | `{"texts": ["prompt1"], "model": "optional"}`        |
| `POST` | `/ner/batch`        | Extract entities from multiple texts | `{"texts": ["text1"], "model": "optional"}`          |
| `POST` | `/moderation/batch` | Moderate multiple texts              | `{"texts": ["text1"], "model": "optional"}`          |

### Example API Usage

#### 🔹 Sentiment Analysis

```bash
curl -X POST "http://localhost:8000/sentiment" \
     -H "Content-Type: application/json" \
     -d '{"text": "I absolutely love this project!"}'
```

Response:

```json
{
	"success": true,
	"label": "POSITIVE",
	"score": 0.998,
	"model_used": "distilbert-base-uncased-finetuned-sst-2-english"
}
```

#### 🔹 Named Entity Recognition

```bash
curl -X POST "http://localhost:8000/ner" \
     -H "Content-Type: application/json" \
     -d '{"text": "Elon Musk founded SpaceX in California."}'
```

Response:

```json
{
	"success": true,
	"entities": [
		{ "word": "Elon Musk", "label": "PERSON", "score": 0.999 },
		{ "word": "SpaceX", "label": "ORG", "score": 0.998 },
		{ "word": "California", "label": "LOC", "score": 0.995 }
	],
	"model_used": "dslim/bert-base-NER"
}
```

#### 🔹 Batch Processing

```bash
curl -X POST "http://localhost:8000/sentiment/batch" \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Great product!", "Terrible experience", "It was okay"]}'
```

Response:

```json
{
	"success": true,
	"results": [
		{ "label": "POSITIVE", "score": 0.998 },
		{ "label": "NEGATIVE", "score": 0.995 },
		{ "label": "NEUTRAL", "score": 0.876 }
	],
	"model_used": "distilbert-base-uncased-finetuned-sst-2-english"
}
```

---

## 🖥️ CLI Examples

#### 🔹 Sentiment Analysis

```text
💬 Enter text: I absolutely love this project!
→ Sentiment: POSITIVE (score: 0.998)
```

#### 🔹 Fill‑Mask

```text
💬 Enter text: The capital of France is [MASK].
→ Predictions:
  1) Paris      score: 0.87
  2) Lyon       score: 0.04
  3) London     score: 0.02
```

#### 🔹 Text Generation

```text
💬 Prompt: Once upon a time
→ Output: Once upon a time there was a young AI learning to code...
```

#### 🔹 NER (Named Entity Recognition)

```text
💬 Enter text: Elon Musk founded SpaceX in California.
→ Entities:
  - Elon Musk  (PERSON)
  - SpaceX     (ORG)
  - California (LOC)
```

#### 🔹 Moderation

```text
💬 Enter text: I hate everything!
→ Result: FLAGGED (toxic content detected)
```

---

## 🧠 Architecture Overview

The application supports dual-mode architecture: **CLI** (interactive) and **API** (REST server), both sharing the same pipeline layer:

### CLI Architecture

```text
           ┌──────────────────────┐
           │     InteractiveCLI   │
           │ (src/cli/base.py)    │
           └──────────┬───────────┘
                      │
                      ▼
             ┌─────────────────┐
             │   Command Layer │  ← e.g. sentiment.py
             │ (user commands) │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │  Pipeline Layer │  ← e.g. pipelines/sentiment.py
             │ (ML logic)      │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │ Display Layer   │  ← cli/display.py
             │ (format output) │
             └─────────────────┘
```

### API Architecture

```text
           ┌──────────────────────┐
           │     FastAPI App      │
           │ (src/api/app.py)     │
           └──────────┬───────────┘
                      │
                      ▼
             ┌─────────────────┐
             │ Pydantic Models │  ← api/models.py
             │ (validation)    │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │  Pipeline Layer │  ← e.g. pipelines/sentiment.py
             │ (ML logic)      │  (shared with CLI)
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │ JSON Response   │  ← automatic serialization
             │ (HTTP output)   │
             └─────────────────┘
```

### Key Concepts

| Layer        | Description                                                                |
| ------------ | -------------------------------------------------------------------------- |
| **CLI**      | Manages user input/output, help menus, and navigation between commands.    |
| **API**      | FastAPI application serving HTTP endpoints with automatic documentation.   |
| **Command**  | Encapsulates a single user-facing operation (e.g., run sentiment).         |
| **Pipeline** | Wraps Hugging Face's `transformers.pipeline()` to perform inference.       |
| **Models**   | Pydantic schemas for request/response validation and serialization.        |
| **Display**  | Handles clean console rendering (colored output, tables, JSON formatting). |

### Key Concepts

| Layer        | Description                                                                |
| ------------ | -------------------------------------------------------------------------- |
| **CLI**      | Manages user input/output, help menus, and navigation between commands.    |
| **Command**  | Encapsulates a single user-facing operation (e.g., run sentiment).         |
| **Pipeline** | Wraps Hugging Face’s `transformers.pipeline()` to perform inference.       |
| **Display**  | Handles clean console rendering (colored output, tables, JSON formatting). |
| **Config**   | Centralizes model names, limits, and global constants.                     |

---

## ⚙️ Configuration

All configuration is centralized in `src/config/settings.py`.

Example:

```python
class Config:
    DEFAULT_MODELS = {
        "sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
        "fillmask":  "bert-base-uncased",
        "textgen":   "gpt2",
        "ner":       "dslim/bert-base-NER",
        "moderation":"unitary/toxic-bert"
    }
    MAX_LENGTH = 512
    BATCH_SIZE = 8
```

You can easily modify model names to experiment with different checkpoints.

---

## 🧩 Extending the Playground

To create a new experiment (e.g., keyword extraction):

### For CLI Support

1. **Duplicate** `src/pipelines/template.py` → `src/pipelines/keywords.py`
   Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.

2. **Create a Command** in `src/commands/keywords.py` to interact with users.

3. **Register the command** inside `src/main.py`:

```python
from src.commands.keywords import KeywordsCommand
cli.register_command(KeywordsCommand())
```

### For API Support

4. **Add Pydantic models** in `src/api/models.py`:

```python
class KeywordsRequest(BaseModel):
    text: str
    model: Optional[str] = None

class KeywordsResponse(BaseModel):
    success: bool
    keywords: List[str]
    model_used: str
```

5. **Add endpoint** in `src/api/app.py`:

```python
@app.post("/keywords", response_model=KeywordsResponse)
async def extract_keywords(request: KeywordsRequest):
    # Implementation using KeywordsAnalyzer pipeline
    pass
```

6. **Update configuration** in `Config.DEFAULT_MODELS`.

Both CLI and API will automatically share the same pipeline implementation!

---

## 🧪 Testing

You can use `pytest` for lightweight validation:

```bash
pip install pytest
pytest -q
```

Recommended structure:

```
tests/
├── test_sentiment.py
├── test_textgen.py
└── ...
```

---

## 🧰 Troubleshooting

### General Issues

| Issue                        | Cause / Solution                             |
| ---------------------------- | -------------------------------------------- |
| **`transformers` not found** | Check virtual environment activation.        |
| **Torch fails to install**   | Install CPU-only version from PyTorch index. |
| **Models download slowly**   | Hugging Face caches them after first run.    |
| **Unicode / accents broken** | Ensure terminal encoding is UTF‑8.           |

### API-Specific Issues

| Issue                         | Cause / Solution                                      |
| ----------------------------- | ----------------------------------------------------- |
| **`FastAPI` not found**       | Install with `pip install fastapi uvicorn[standard]`. |
| **Port already in use**       | Use `--port 8001` or kill process on port 8000.       |
| **CORS errors in browser**    | Check `allow_origins` in `src/api/config.py`.         |
| **422 Validation Error**      | Check request body matches Pydantic models.           |
| **500 Internal Server Error** | Check model loading and pipeline initialization.      |

### Quick API Health Check

```bash
# Test if API is running
curl http://localhost:8000/health

# Test basic endpoint
curl -X POST "http://localhost:8000/sentiment" \
     -H "Content-Type: application/json" \
     -d '{"text": "test"}'
```

---

## 🧭 Development Guidelines

-   Keep **Command** classes lightweight — no ML logic inside them.
-   Reuse the **Pipeline Template** for new experiments.
-   Format outputs consistently via the `DisplayFormatter`.
-   Document all new models or commands in `README.md` and `settings.py`.

---

## 🧱 Roadmap

-   [ ] Add non-interactive CLI flags (`--text`, `--task`)
-   [ ] Add multilingual model options
-   [ ] Add automatic test coverage
-   [ ] Add logging and profiling utilities
-   [ ] Add export to JSON/CSV results

---

## 📜 License

This project is licensed under the [MIT License](./LICENSE) — feel free to use it, modify it, and share it!

---