ai-lab-transformers-playground/README.md

524 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🧠 AI Lab Transformers CLI Playground
> A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive CommandLine Interface (CLI)**.
> This playground provides readytouse NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, FillMask, Moderation, etc.) in a modular, extensible, and educational codebase.
---
## 📚 Overview
The **AI Lab Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.
Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.
The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.
---
## 🗂️ Project Structure
```text
src/
├── __init__.py
├── main.py # CLI entry point
├── cli/
│ ├── __init__.py
│ ├── base.py # CLICommand base class & interactive shell handler
│ └── display.py # Console formatting utilities (tables, colors, results)
├── commands/ # User-facing commands wrapping pipeline logic
│ ├── __init__.py
│ ├── sentiment.py # Sentiment analysis command
│ ├── fillmask.py # Masked token prediction command
│ ├── textgen.py # Text generation command
│ ├── ner.py # Named Entity Recognition command
│ └── moderation.py # Toxicity / content moderation command
├── pipelines/ # Machine learning logic (Hugging Face Transformers)
│ ├── __init__.py
│ ├── template.py # Blueprint for creating new pipelines
│ ├── sentiment.py
│ ├── fillmask.py
│ ├── textgen.py
│ ├── ner.py
│ └── moderation.py
├── api/
│ ├── __init__.py
│ ├── app.py # FastAPI application with all endpoints
│ ├── models.py # Pydantic request/response models
│ └── config.py # API-specific configuration
└── config/
├── __init__.py
└── settings.py # Global configuration (default models, parameters)
```
---
## ⚙️ Installation
### 🧾 Option 1 Using Poetry (Recommended)
> Poetry is used as the main dependency manager.
```bash
# 1. Create and activate a new virtual environment
poetry shell
# 2. Install dependencies
poetry install
```
This will automatically install all dependencies declared in `pyproject.toml`, including **transformers**, **torch**, and **FastAPI** for the API mode.
To run the application inside the Poetry environment:
```bash
# CLI mode
poetry run python src/main.py --mode cli
# API mode
poetry run python src/main.py --mode api
```
---
### 📦 Option 2 Using pip and requirements.txt
If you prefer using `requirements.txt` manually:
```bash
# 1. Create a virtual environment
python -m venv .venv
# 2. Activate it
# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.venv\Scripts\Activate.ps1
# 3. Install dependencies
pip install -r requirements.txt
```
---
## ▶️ Usage
The application supports two modes: **CLI** (interactive) and **API** (REST server).
### 🖥️ CLI Mode
Launch the interactive CLI with:
```bash
python -m src.main --mode cli
# or, if using Poetry
poetry run python src/main.py --mode cli
```
You'll see an interactive menu listing the available commands:
```
Welcome to AI Lab - Transformers CLI Playground
Available commands:
• sentiment Analyze the sentiment of a text
• fillmask Predict masked words in a sentence
• textgen Generate text from a prompt
• ner Extract named entities from text
• moderation Detect toxic or unsafe content
```
### 🌐 API Mode
Launch the FastAPI server with:
```bash
python -m src.main --mode api
# or with custom settings
python -m src.main --mode api --host 0.0.0.0 --port 8000 --reload
```
The API will be available at:
- **Swagger Documentation**: http://localhost:8000/docs
- **ReDoc Documentation**: http://localhost:8000/redoc
- **OpenAPI Schema**: http://localhost:8000/openapi.json
## 📡 API Endpoints
The REST API provides all CLI functionality through HTTP endpoints:
### Core Endpoints
| Method | Endpoint | Description |
| ------ | --------- | -------------------------------- |
| `GET` | `/` | Health check and API information |
| `GET` | `/health` | Detailed health status |
### Individual Processing
| Method | Endpoint | Description | Input |
| ------ | ------------- | ------------------------ | ------------------------------------------------------------------ |
| `POST` | `/sentiment` | Analyze text sentiment | `{"text": "string", "model": "optional"}` |
| `POST` | `/fillmask` | Fill masked words | `{"text": "Hello [MASK]", "model": "optional"}` |
| `POST` | `/textgen` | Generate text | `{"text": "prompt", "model": "optional"}` |
| `POST` | `/ner` | Named entity recognition | `{"text": "string", "model": "optional"}` |
| `POST` | `/qa` | Question answering | `{"question": "string", "context": "string", "model": "optional"}` |
| `POST` | `/moderation` | Content moderation | `{"text": "string", "model": "optional"}` |
### Batch Processing
| Method | Endpoint | Description | Input |
| ------ | ------------------- | ------------------------------------ | ---------------------------------------------------- |
| `POST` | `/sentiment/batch` | Process multiple texts | `{"texts": ["text1", "text2"], "model": "optional"}` |
| `POST` | `/fillmask/batch` | Fill multiple masked texts | `{"texts": ["text1 [MASK]"], "model": "optional"}` |
| `POST` | `/textgen/batch` | Generate from multiple prompts | `{"texts": ["prompt1"], "model": "optional"}` |
| `POST` | `/ner/batch` | Extract entities from multiple texts | `{"texts": ["text1"], "model": "optional"}` |
| `POST` | `/moderation/batch` | Moderate multiple texts | `{"texts": ["text1"], "model": "optional"}` |
### Example API Usage
#### 🔹 Sentiment Analysis
```bash
curl -X POST "http://localhost:8000/sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "I absolutely love this project!"}'
```
Response:
```json
{
"success": true,
"label": "POSITIVE",
"score": 0.998,
"model_used": "distilbert-base-uncased-finetuned-sst-2-english"
}
```
#### 🔹 Named Entity Recognition
```bash
curl -X POST "http://localhost:8000/ner" \
-H "Content-Type: application/json" \
-d '{"text": "Elon Musk founded SpaceX in California."}'
```
Response:
```json
{
"success": true,
"entities": [
{ "word": "Elon Musk", "label": "PERSON", "score": 0.999 },
{ "word": "SpaceX", "label": "ORG", "score": 0.998 },
{ "word": "California", "label": "LOC", "score": 0.995 }
],
"model_used": "dslim/bert-base-NER"
}
```
#### 🔹 Batch Processing
```bash
curl -X POST "http://localhost:8000/sentiment/batch" \
-H "Content-Type: application/json" \
-d '{"texts": ["Great product!", "Terrible experience", "It was okay"]}'
```
Response:
```json
{
"success": true,
"results": [
{ "label": "POSITIVE", "score": 0.998 },
{ "label": "NEGATIVE", "score": 0.995 },
{ "label": "NEUTRAL", "score": 0.876 }
],
"model_used": "distilbert-base-uncased-finetuned-sst-2-english"
}
```
---
## 🖥️ CLI Examples
#### 🔹 Sentiment Analysis
```text
💬 Enter text: I absolutely love this project!
→ Sentiment: POSITIVE (score: 0.998)
```
#### 🔹 FillMask
```text
💬 Enter text: The capital of France is [MASK].
→ Predictions:
1) Paris score: 0.87
2) Lyon score: 0.04
3) London score: 0.02
```
#### 🔹 Text Generation
```text
💬 Prompt: Once upon a time
→ Output: Once upon a time there was a young AI learning to code...
```
#### 🔹 NER (Named Entity Recognition)
```text
💬 Enter text: Elon Musk founded SpaceX in California.
→ Entities:
- Elon Musk (PERSON)
- SpaceX (ORG)
- California (LOC)
```
#### 🔹 Moderation
```text
💬 Enter text: I hate everything!
→ Result: FLAGGED (toxic content detected)
```
---
## 🧠 Architecture Overview
The application supports dual-mode architecture: **CLI** (interactive) and **API** (REST server), both sharing the same pipeline layer:
### CLI Architecture
```text
┌──────────────────────┐
│ InteractiveCLI │
│ (src/cli/base.py) │
└──────────┬───────────┘
┌─────────────────┐
│ Command Layer │ ← e.g. sentiment.py
│ (user commands) │
└───────┬─────────┘
┌─────────────────┐
│ Pipeline Layer │ ← e.g. pipelines/sentiment.py
│ (ML logic) │
└───────┬─────────┘
┌─────────────────┐
│ Display Layer │ ← cli/display.py
│ (format output) │
└─────────────────┘
```
### API Architecture
```text
┌──────────────────────┐
│ FastAPI App │
│ (src/api/app.py) │
└──────────┬───────────┘
┌─────────────────┐
│ Pydantic Models │ ← api/models.py
│ (validation) │
└───────┬─────────┘
┌─────────────────┐
│ Pipeline Layer │ ← e.g. pipelines/sentiment.py
│ (ML logic) │ (shared with CLI)
└───────┬─────────┘
┌─────────────────┐
│ JSON Response │ ← automatic serialization
│ (HTTP output) │
└─────────────────┘
```
### Key Concepts
| Layer | Description |
| ------------ | -------------------------------------------------------------------------- |
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
| **API** | FastAPI application serving HTTP endpoints with automatic documentation. |
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
| **Pipeline** | Wraps Hugging Face's `transformers.pipeline()` to perform inference. |
| **Models** | Pydantic schemas for request/response validation and serialization. |
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
### Key Concepts
| Layer | Description |
| ------------ | -------------------------------------------------------------------------- |
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
| **Pipeline** | Wraps Hugging Faces `transformers.pipeline()` to perform inference. |
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
| **Config** | Centralizes model names, limits, and global constants. |
---
## ⚙️ Configuration
All configuration is centralized in `src/config/settings.py`.
Example:
```python
class Config:
DEFAULT_MODELS = {
"sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
"fillmask": "bert-base-uncased",
"textgen": "gpt2",
"ner": "dslim/bert-base-NER",
"moderation":"unitary/toxic-bert"
}
MAX_LENGTH = 512
BATCH_SIZE = 8
```
You can easily modify model names to experiment with different checkpoints.
---
## 🧩 Extending the Playground
To create a new experiment (e.g., keyword extraction):
### For CLI Support
1. **Duplicate** `src/pipelines/template.py``src/pipelines/keywords.py`
Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.
2. **Create a Command** in `src/commands/keywords.py` to interact with users.
3. **Register the command** inside `src/main.py`:
```python
from src.commands.keywords import KeywordsCommand
cli.register_command(KeywordsCommand())
```
### For API Support
4. **Add Pydantic models** in `src/api/models.py`:
```python
class KeywordsRequest(BaseModel):
text: str
model: Optional[str] = None
class KeywordsResponse(BaseModel):
success: bool
keywords: List[str]
model_used: str
```
5. **Add endpoint** in `src/api/app.py`:
```python
@app.post("/keywords", response_model=KeywordsResponse)
async def extract_keywords(request: KeywordsRequest):
# Implementation using KeywordsAnalyzer pipeline
pass
```
6. **Update configuration** in `Config.DEFAULT_MODELS`.
Both CLI and API will automatically share the same pipeline implementation!
---
## 🧪 Testing
You can use `pytest` for lightweight validation:
```bash
pip install pytest
pytest -q
```
Recommended structure:
```
tests/
├── test_sentiment.py
├── test_textgen.py
└── ...
```
---
## 🧰 Troubleshooting
### General Issues
| Issue | Cause / Solution |
| ---------------------------- | -------------------------------------------- |
| **`transformers` not found** | Check virtual environment activation. |
| **Torch fails to install** | Install CPU-only version from PyTorch index. |
| **Models download slowly** | Hugging Face caches them after first run. |
| **Unicode / accents broken** | Ensure terminal encoding is UTF8. |
### API-Specific Issues
| Issue | Cause / Solution |
| ----------------------------- | ----------------------------------------------------- |
| **`FastAPI` not found** | Install with `pip install fastapi uvicorn[standard]`. |
| **Port already in use** | Use `--port 8001` or kill process on port 8000. |
| **CORS errors in browser** | Check `allow_origins` in `src/api/config.py`. |
| **422 Validation Error** | Check request body matches Pydantic models. |
| **500 Internal Server Error** | Check model loading and pipeline initialization. |
### Quick API Health Check
```bash
# Test if API is running
curl http://localhost:8000/health
# Test basic endpoint
curl -X POST "http://localhost:8000/sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "test"}'
```
---
## 🧭 Development Guidelines
- Keep **Command** classes lightweight — no ML logic inside them.
- Reuse the **Pipeline Template** for new experiments.
- Format outputs consistently via the `DisplayFormatter`.
- Document all new models or commands in `README.md` and `settings.py`.
---
## 🧱 Roadmap
- [ ] Add non-interactive CLI flags (`--text`, `--task`)
- [ ] Add multilingual model options
- [ ] Add automatic test coverage
- [ ] Add logging and profiling utilities
- [ ] Add export to JSON/CSV results
---
## 📜 License
This project is licensed under the [MIT License](./LICENSE) — feel free to use it, modify it, and share it!
---