524 lines
16 KiB
Markdown
524 lines
16 KiB
Markdown
# 🧠 AI Lab – Transformers CLI Playground
|
||
|
||
> A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive Command‑Line Interface (CLI)**.
|
||
> This playground provides ready‑to‑use NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, Fill‑Mask, Moderation, etc.) in a modular, extensible, and educational codebase.
|
||
|
||
---
|
||
|
||
## 📚 Overview
|
||
|
||
The **AI Lab – Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.
|
||
Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.
|
||
|
||
The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.
|
||
|
||
---
|
||
|
||
## 🗂️ Project Structure
|
||
|
||
```text
|
||
src/
|
||
├── __init__.py
|
||
├── main.py # CLI entry point
|
||
│
|
||
├── cli/
|
||
│ ├── __init__.py
|
||
│ ├── base.py # CLICommand base class & interactive shell handler
|
||
│ └── display.py # Console formatting utilities (tables, colors, results)
|
||
│
|
||
├── commands/ # User-facing commands wrapping pipeline logic
|
||
│ ├── __init__.py
|
||
│ ├── sentiment.py # Sentiment analysis command
|
||
│ ├── fillmask.py # Masked token prediction command
|
||
│ ├── textgen.py # Text generation command
|
||
│ ├── ner.py # Named Entity Recognition command
|
||
│ └── moderation.py # Toxicity / content moderation command
|
||
│
|
||
├── pipelines/ # Machine learning logic (Hugging Face Transformers)
|
||
│ ├── __init__.py
|
||
│ ├── template.py # Blueprint for creating new pipelines
|
||
│ ├── sentiment.py
|
||
│ ├── fillmask.py
|
||
│ ├── textgen.py
|
||
│ ├── ner.py
|
||
│ └── moderation.py
|
||
│
|
||
├── api/
|
||
│ ├── __init__.py
|
||
│ ├── app.py # FastAPI application with all endpoints
|
||
│ ├── models.py # Pydantic request/response models
|
||
│ └── config.py # API-specific configuration
|
||
│
|
||
└── config/
|
||
├── __init__.py
|
||
└── settings.py # Global configuration (default models, parameters)
|
||
```
|
||
|
||
---
|
||
|
||
## ⚙️ Installation
|
||
|
||
### 🧾 Option 1 – Using Poetry (Recommended)
|
||
|
||
> Poetry is used as the main dependency manager.
|
||
|
||
```bash
|
||
# 1. Create and activate a new virtual environment
|
||
poetry shell
|
||
|
||
# 2. Install dependencies
|
||
poetry install
|
||
```
|
||
|
||
This will automatically install all dependencies declared in `pyproject.toml`, including **transformers**, **torch**, and **FastAPI** for the API mode.
|
||
|
||
To run the application inside the Poetry environment:
|
||
|
||
```bash
|
||
# CLI mode
|
||
poetry run python src/main.py --mode cli
|
||
|
||
# API mode
|
||
poetry run python src/main.py --mode api
|
||
```
|
||
|
||
---
|
||
|
||
### 📦 Option 2 – Using pip and requirements.txt
|
||
|
||
If you prefer using `requirements.txt` manually:
|
||
|
||
```bash
|
||
# 1. Create a virtual environment
|
||
python -m venv .venv
|
||
|
||
# 2. Activate it
|
||
# Linux/macOS
|
||
source .venv/bin/activate
|
||
# Windows PowerShell
|
||
.venv\Scripts\Activate.ps1
|
||
|
||
# 3. Install dependencies
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
---
|
||
|
||
## ▶️ Usage
|
||
|
||
The application supports two modes: **CLI** (interactive) and **API** (REST server).
|
||
|
||
### 🖥️ CLI Mode
|
||
|
||
Launch the interactive CLI with:
|
||
|
||
```bash
|
||
python -m src.main --mode cli
|
||
# or, if using Poetry
|
||
poetry run python src/main.py --mode cli
|
||
```
|
||
|
||
You'll see an interactive menu listing the available commands:
|
||
|
||
```
|
||
Welcome to AI Lab - Transformers CLI Playground
|
||
Available commands:
|
||
• sentiment – Analyze the sentiment of a text
|
||
• fillmask – Predict masked words in a sentence
|
||
• textgen – Generate text from a prompt
|
||
• ner – Extract named entities from text
|
||
• moderation – Detect toxic or unsafe content
|
||
```
|
||
|
||
### 🌐 API Mode
|
||
|
||
Launch the FastAPI server with:
|
||
|
||
```bash
|
||
python -m src.main --mode api
|
||
# or with custom settings
|
||
python -m src.main --mode api --host 0.0.0.0 --port 8000 --reload
|
||
```
|
||
|
||
The API will be available at:
|
||
|
||
- **Swagger Documentation**: http://localhost:8000/docs
|
||
- **ReDoc Documentation**: http://localhost:8000/redoc
|
||
- **OpenAPI Schema**: http://localhost:8000/openapi.json
|
||
|
||
## 📡 API Endpoints
|
||
|
||
The REST API provides all CLI functionality through HTTP endpoints:
|
||
|
||
### Core Endpoints
|
||
|
||
| Method | Endpoint | Description |
|
||
| ------ | --------- | -------------------------------- |
|
||
| `GET` | `/` | Health check and API information |
|
||
| `GET` | `/health` | Detailed health status |
|
||
|
||
### Individual Processing
|
||
|
||
| Method | Endpoint | Description | Input |
|
||
| ------ | ------------- | ------------------------ | ------------------------------------------------------------------ |
|
||
| `POST` | `/sentiment` | Analyze text sentiment | `{"text": "string", "model": "optional"}` |
|
||
| `POST` | `/fillmask` | Fill masked words | `{"text": "Hello [MASK]", "model": "optional"}` |
|
||
| `POST` | `/textgen` | Generate text | `{"text": "prompt", "model": "optional"}` |
|
||
| `POST` | `/ner` | Named entity recognition | `{"text": "string", "model": "optional"}` |
|
||
| `POST` | `/qa` | Question answering | `{"question": "string", "context": "string", "model": "optional"}` |
|
||
| `POST` | `/moderation` | Content moderation | `{"text": "string", "model": "optional"}` |
|
||
|
||
### Batch Processing
|
||
|
||
| Method | Endpoint | Description | Input |
|
||
| ------ | ------------------- | ------------------------------------ | ---------------------------------------------------- |
|
||
| `POST` | `/sentiment/batch` | Process multiple texts | `{"texts": ["text1", "text2"], "model": "optional"}` |
|
||
| `POST` | `/fillmask/batch` | Fill multiple masked texts | `{"texts": ["text1 [MASK]"], "model": "optional"}` |
|
||
| `POST` | `/textgen/batch` | Generate from multiple prompts | `{"texts": ["prompt1"], "model": "optional"}` |
|
||
| `POST` | `/ner/batch` | Extract entities from multiple texts | `{"texts": ["text1"], "model": "optional"}` |
|
||
| `POST` | `/moderation/batch` | Moderate multiple texts | `{"texts": ["text1"], "model": "optional"}` |
|
||
|
||
### Example API Usage
|
||
|
||
#### 🔹 Sentiment Analysis
|
||
|
||
```bash
|
||
curl -X POST "http://localhost:8000/sentiment" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"text": "I absolutely love this project!"}'
|
||
```
|
||
|
||
Response:
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"label": "POSITIVE",
|
||
"score": 0.998,
|
||
"model_used": "distilbert-base-uncased-finetuned-sst-2-english"
|
||
}
|
||
```
|
||
|
||
#### 🔹 Named Entity Recognition
|
||
|
||
```bash
|
||
curl -X POST "http://localhost:8000/ner" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"text": "Elon Musk founded SpaceX in California."}'
|
||
```
|
||
|
||
Response:
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"entities": [
|
||
{ "word": "Elon Musk", "label": "PERSON", "score": 0.999 },
|
||
{ "word": "SpaceX", "label": "ORG", "score": 0.998 },
|
||
{ "word": "California", "label": "LOC", "score": 0.995 }
|
||
],
|
||
"model_used": "dslim/bert-base-NER"
|
||
}
|
||
```
|
||
|
||
#### 🔹 Batch Processing
|
||
|
||
```bash
|
||
curl -X POST "http://localhost:8000/sentiment/batch" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"texts": ["Great product!", "Terrible experience", "It was okay"]}'
|
||
```
|
||
|
||
Response:
|
||
|
||
```json
|
||
{
|
||
"success": true,
|
||
"results": [
|
||
{ "label": "POSITIVE", "score": 0.998 },
|
||
{ "label": "NEGATIVE", "score": 0.995 },
|
||
{ "label": "NEUTRAL", "score": 0.876 }
|
||
],
|
||
"model_used": "distilbert-base-uncased-finetuned-sst-2-english"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🖥️ CLI Examples
|
||
|
||
#### 🔹 Sentiment Analysis
|
||
|
||
```text
|
||
💬 Enter text: I absolutely love this project!
|
||
→ Sentiment: POSITIVE (score: 0.998)
|
||
```
|
||
|
||
#### 🔹 Fill‑Mask
|
||
|
||
```text
|
||
💬 Enter text: The capital of France is [MASK].
|
||
→ Predictions:
|
||
1) Paris score: 0.87
|
||
2) Lyon score: 0.04
|
||
3) London score: 0.02
|
||
```
|
||
|
||
#### 🔹 Text Generation
|
||
|
||
```text
|
||
💬 Prompt: Once upon a time
|
||
→ Output: Once upon a time there was a young AI learning to code...
|
||
```
|
||
|
||
#### 🔹 NER (Named Entity Recognition)
|
||
|
||
```text
|
||
💬 Enter text: Elon Musk founded SpaceX in California.
|
||
→ Entities:
|
||
- Elon Musk (PERSON)
|
||
- SpaceX (ORG)
|
||
- California (LOC)
|
||
```
|
||
|
||
#### 🔹 Moderation
|
||
|
||
```text
|
||
💬 Enter text: I hate everything!
|
||
→ Result: FLAGGED (toxic content detected)
|
||
```
|
||
|
||
---
|
||
|
||
## 🧠 Architecture Overview
|
||
|
||
The application supports dual-mode architecture: **CLI** (interactive) and **API** (REST server), both sharing the same pipeline layer:
|
||
|
||
### CLI Architecture
|
||
|
||
```text
|
||
┌──────────────────────┐
|
||
│ InteractiveCLI │
|
||
│ (src/cli/base.py) │
|
||
└──────────┬───────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Command Layer │ ← e.g. sentiment.py
|
||
│ (user commands) │
|
||
└───────┬─────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Pipeline Layer │ ← e.g. pipelines/sentiment.py
|
||
│ (ML logic) │
|
||
└───────┬─────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Display Layer │ ← cli/display.py
|
||
│ (format output) │
|
||
└─────────────────┘
|
||
```
|
||
|
||
### API Architecture
|
||
|
||
```text
|
||
┌──────────────────────┐
|
||
│ FastAPI App │
|
||
│ (src/api/app.py) │
|
||
└──────────┬───────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Pydantic Models │ ← api/models.py
|
||
│ (validation) │
|
||
└───────┬─────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Pipeline Layer │ ← e.g. pipelines/sentiment.py
|
||
│ (ML logic) │ (shared with CLI)
|
||
└───────┬─────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ JSON Response │ ← automatic serialization
|
||
│ (HTTP output) │
|
||
└─────────────────┘
|
||
```
|
||
|
||
### Key Concepts
|
||
|
||
| Layer | Description |
|
||
| ------------ | -------------------------------------------------------------------------- |
|
||
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
|
||
| **API** | FastAPI application serving HTTP endpoints with automatic documentation. |
|
||
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
|
||
| **Pipeline** | Wraps Hugging Face's `transformers.pipeline()` to perform inference. |
|
||
| **Models** | Pydantic schemas for request/response validation and serialization. |
|
||
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
|
||
|
||
### Key Concepts
|
||
|
||
| Layer | Description |
|
||
| ------------ | -------------------------------------------------------------------------- |
|
||
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
|
||
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
|
||
| **Pipeline** | Wraps Hugging Face’s `transformers.pipeline()` to perform inference. |
|
||
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
|
||
| **Config** | Centralizes model names, limits, and global constants. |
|
||
|
||
---
|
||
|
||
## ⚙️ Configuration
|
||
|
||
All configuration is centralized in `src/config/settings.py`.
|
||
|
||
Example:
|
||
|
||
```python
|
||
class Config:
|
||
DEFAULT_MODELS = {
|
||
"sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
|
||
"fillmask": "bert-base-uncased",
|
||
"textgen": "gpt2",
|
||
"ner": "dslim/bert-base-NER",
|
||
"moderation":"unitary/toxic-bert"
|
||
}
|
||
MAX_LENGTH = 512
|
||
BATCH_SIZE = 8
|
||
```
|
||
|
||
You can easily modify model names to experiment with different checkpoints.
|
||
|
||
---
|
||
|
||
## 🧩 Extending the Playground
|
||
|
||
To create a new experiment (e.g., keyword extraction):
|
||
|
||
### For CLI Support
|
||
|
||
1. **Duplicate** `src/pipelines/template.py` → `src/pipelines/keywords.py`
|
||
Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.
|
||
|
||
2. **Create a Command** in `src/commands/keywords.py` to interact with users.
|
||
|
||
3. **Register the command** inside `src/main.py`:
|
||
|
||
```python
|
||
from src.commands.keywords import KeywordsCommand
|
||
cli.register_command(KeywordsCommand())
|
||
```
|
||
|
||
### For API Support
|
||
|
||
4. **Add Pydantic models** in `src/api/models.py`:
|
||
|
||
```python
|
||
class KeywordsRequest(BaseModel):
|
||
text: str
|
||
model: Optional[str] = None
|
||
|
||
class KeywordsResponse(BaseModel):
|
||
success: bool
|
||
keywords: List[str]
|
||
model_used: str
|
||
```
|
||
|
||
5. **Add endpoint** in `src/api/app.py`:
|
||
|
||
```python
|
||
@app.post("/keywords", response_model=KeywordsResponse)
|
||
async def extract_keywords(request: KeywordsRequest):
|
||
# Implementation using KeywordsAnalyzer pipeline
|
||
pass
|
||
```
|
||
|
||
6. **Update configuration** in `Config.DEFAULT_MODELS`.
|
||
|
||
Both CLI and API will automatically share the same pipeline implementation!
|
||
|
||
---
|
||
|
||
## 🧪 Testing
|
||
|
||
You can use `pytest` for lightweight validation:
|
||
|
||
```bash
|
||
pip install pytest
|
||
pytest -q
|
||
```
|
||
|
||
Recommended structure:
|
||
|
||
```
|
||
tests/
|
||
├── test_sentiment.py
|
||
├── test_textgen.py
|
||
└── ...
|
||
```
|
||
|
||
---
|
||
|
||
## 🧰 Troubleshooting
|
||
|
||
### General Issues
|
||
|
||
| Issue | Cause / Solution |
|
||
| ---------------------------- | -------------------------------------------- |
|
||
| **`transformers` not found** | Check virtual environment activation. |
|
||
| **Torch fails to install** | Install CPU-only version from PyTorch index. |
|
||
| **Models download slowly** | Hugging Face caches them after first run. |
|
||
| **Unicode / accents broken** | Ensure terminal encoding is UTF‑8. |
|
||
|
||
### API-Specific Issues
|
||
|
||
| Issue | Cause / Solution |
|
||
| ----------------------------- | ----------------------------------------------------- |
|
||
| **`FastAPI` not found** | Install with `pip install fastapi uvicorn[standard]`. |
|
||
| **Port already in use** | Use `--port 8001` or kill process on port 8000. |
|
||
| **CORS errors in browser** | Check `allow_origins` in `src/api/config.py`. |
|
||
| **422 Validation Error** | Check request body matches Pydantic models. |
|
||
| **500 Internal Server Error** | Check model loading and pipeline initialization. |
|
||
|
||
### Quick API Health Check
|
||
|
||
```bash
|
||
# Test if API is running
|
||
curl http://localhost:8000/health
|
||
|
||
# Test basic endpoint
|
||
curl -X POST "http://localhost:8000/sentiment" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"text": "test"}'
|
||
```
|
||
|
||
---
|
||
|
||
## 🧭 Development Guidelines
|
||
|
||
- Keep **Command** classes lightweight — no ML logic inside them.
|
||
- Reuse the **Pipeline Template** for new experiments.
|
||
- Format outputs consistently via the `DisplayFormatter`.
|
||
- Document all new models or commands in `README.md` and `settings.py`.
|
||
|
||
---
|
||
|
||
## 🧱 Roadmap
|
||
|
||
- [ ] Add non-interactive CLI flags (`--text`, `--task`)
|
||
- [ ] Add multilingual model options
|
||
- [ ] Add automatic test coverage
|
||
- [ ] Add logging and profiling utilities
|
||
- [ ] Add export to JSON/CSV results
|
||
|
||
---
|
||
|
||
## 📜 License
|
||
|
||
This project is licensed under the [MIT License](./LICENSE) — feel free to use it, modify it, and share it!
|
||
|
||
---
|