ai-lab-transformers-playground/README.md

# 🧠 AI Lab – Transformers CLI Playground

> A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive Command‑Line Interface (CLI)**.
> This playground provides ready‑to‑use NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, Fill‑Mask, Moderation, etc.) in a modular, extensible, and educational codebase.

---

## 📚 Overview

The **AI Lab – Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.
Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.

The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.

---

## 🗂️ Project Structure

```text
src/
├── __init__.py
├── main.py                 # CLI entry point
│
├── cli/
│   ├── __init__.py
│   ├── base.py             # CLICommand base class & interactive shell handler
│   └── display.py          # Console formatting utilities (tables, colors, results)
│
├── commands/               # User-facing commands wrapping pipeline logic
│   ├── __init__.py
│   ├── sentiment.py        # Sentiment analysis command
│   ├── fillmask.py         # Masked token prediction command
│   ├── textgen.py          # Text generation command
│   ├── ner.py              # Named Entity Recognition command
│   └── moderation.py       # Toxicity / content moderation command
│
├── pipelines/              # Machine learning logic (Hugging Face Transformers)
│   ├── __init__.py
│   ├── template.py         # Blueprint for creating new pipelines
│   ├── sentiment.py
│   ├── fillmask.py
│   ├── textgen.py
│   ├── ner.py
│   └── moderation.py
│
└── config/
    ├── __init__.py
    └── settings.py         # Global configuration (default models, parameters)
```

---

## ⚙️ Installation

### 🧾 Option 1 – Using Poetry (Recommended)

> Poetry is used as the main dependency manager.

```bash
# 1. Create and activate a new virtual environment
poetry shell

# 2. Install dependencies
poetry install
```

This will automatically install all dependencies declared in `pyproject.toml`, including **transformers** and **torch**.

To run the CLI inside the Poetry environment:
```bash
poetry run python src/main.py
```

---

### 📦 Option 2 – Using pip and requirements.txt

If you prefer using `requirements.txt` manually:

```bash
# 1. Create a virtual environment
python -m venv .venv

# 2. Activate it
# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.venv\Scripts\Activate.ps1

# 3. Install dependencies
pip install -r requirements.txt
```

---

## ▶️ Usage

Once installed, launch the CLI with:

```bash
python -m src.main
# or, if using Poetry
poetry run python src/main.py
```

You’ll see an interactive menu listing the available commands:

```
Welcome to AI Lab - Transformers CLI Playground
Available commands:
  • sentiment     – Analyze the sentiment of a text
  • fillmask      – Predict masked words in a sentence
  • textgen       – Generate text from a prompt
  • ner           – Extract named entities from text
  • moderation    – Detect toxic or unsafe content
```

### Example Sessions

#### 🔹 Sentiment Analysis
```text
💬 Enter text: I absolutely love this project!
→ Sentiment: POSITIVE (score: 0.998)
```

#### 🔹 Fill‑Mask
```text
💬 Enter text: The capital of France is [MASK].
→ Predictions:
  1) Paris      score: 0.87
  2) Lyon       score: 0.04
  3) London     score: 0.02
```

#### 🔹 Text Generation
```text
💬 Prompt: Once upon a time
→ Output: Once upon a time there was a young AI learning to code...
```

#### 🔹 NER (Named Entity Recognition)
```text
💬 Enter text: Elon Musk founded SpaceX in California.
→ Entities:
  - Elon Musk  (PERSON)
  - SpaceX     (ORG)
  - California (LOC)
```

#### 🔹 Moderation
```text
💬 Enter text: I hate everything!
→ Result: FLAGGED (toxic content detected)
```

---

## 🧠 Architecture Overview

The internal structure follows a clean **Command ↔ Pipeline ↔ Display** pattern:

```text
           ┌──────────────────────┐
           │     InteractiveCLI   │
           │ (src/cli/base.py)    │
           └──────────┬───────────┘
                      │
                      ▼
             ┌─────────────────┐
             │   Command Layer │  ← e.g. sentiment.py
             │ (user commands) │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │  Pipeline Layer │  ← e.g. pipelines/sentiment.py
             │ (ML logic)      │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │ Display Layer   │  ← cli/display.py
             │ (format output) │
             └─────────────────┘
```

### Key Concepts

| Layer | Description |
|-------|--------------|
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
| **Pipeline** | Wraps Hugging Face’s `transformers.pipeline()` to perform inference. |
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
| **Config** | Centralizes model names, limits, and global constants. |

---

## ⚙️ Configuration

All configuration is centralized in `src/config/settings.py`.

Example:

```python
class Config:
    DEFAULT_MODELS = {
        "sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
        "fillmask":  "bert-base-uncased",
        "textgen":   "gpt2",
        "ner":       "dslim/bert-base-NER",
        "moderation":"unitary/toxic-bert"
    }
    MAX_LENGTH = 512
    BATCH_SIZE = 8
```

You can easily modify model names to experiment with different checkpoints.

---

## 🧩 Extending the Playground

To create a new experiment (e.g., keyword extraction):

1. **Duplicate** `src/pipelines/template.py` → `src/pipelines/keywords.py`
   Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.

2. **Create a Command** in `src/commands/keywords.py` to interact with users.

3. **Register the command** inside `src/main.py`:

```python
from src.commands.keywords import KeywordsCommand
cli.register_command(KeywordsCommand())
```

4. Optionally, add a model name in `Config.DEFAULT_MODELS`.

---

## 🧪 Testing

You can use `pytest` for lightweight validation:

```bash
pip install pytest
pytest -q
```

Recommended structure:

```
tests/
├── test_sentiment.py
├── test_textgen.py
└── ...
```

---

## 🧰 Troubleshooting

| Issue | Cause / Solution |
|-------|------------------|
| **`transformers` not found** | Check virtual environment activation. |
| **Torch fails to install** | Install CPU-only version from PyTorch index. |
| **Models download slowly** | Hugging Face caches them after first run. |
| **Unicode / accents broken** | Ensure terminal encoding is UTF‑8. |

---

## 🧭 Development Guidelines

- Keep **Command** classes lightweight — no ML logic inside them.
- Reuse the **Pipeline Template** for new experiments.
- Format outputs consistently via the `DisplayFormatter`.
- Document all new models or commands in `README.md` and `settings.py`.

---

## 🧱 Roadmap

- [ ] Add non-interactive CLI flags (`--text`, `--task`)
- [ ] Add multilingual model options
- [ ] Add automatic test coverage
- [ ] Add logging and profiling utilities
- [ ] Add export to JSON/CSV results

---

## 🪪 License

You can include a standard open-source license such as **MIT** or **Apache 2.0** depending on your use case.

---

## 🤝 Contributing

This repository is meant as an **educational sandbox** for experimenting with Transformers.
Pull requests are welcome for new models, better CLI UX, or educational improvements.

---

### ✨ Key Takeaways

- Modular and pedagogical design for training environments
- Clean separation between **I/O**, **ML logic**, and **UX**
- Easily extensible architecture for adding custom pipelines
- Perfect sandbox for students, researchers, and developers to learn modern NLP tools

---

> 🧩 Built for experimentation. Learn, break, and rebuild.