ai-lab-transformers-playground/README.md

370 lines
10 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🧠 AI Lab Transformers CLI Playground
> A **pedagogical and technical project** designed for AI practitioners and students to explore **Hugging Face Transformers** through an **interactive Command-Line Interface (CLI)** or a **REST API**.
> This playground provides ready-to-use NLP pipelines — including **Sentiment Analysis**, **Named Entity Recognition**, **Text Generation**, **Fill-Mask**, **Question Answering (QA)**, **Moderation**, and more — in a modular, extensible, and educational codebase.
---
<p align="center">
<img src="https://img.shields.io/badge/Python-3.13-blue.svg" alt="Python"/>
<img src="https://img.shields.io/badge/Built_with-Poetry-purple.svg" alt="Poetry"/>
<img src="https://img.shields.io/badge/🤗-Transformers-orange.svg" alt="Transformers"/>
<img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License"/>
</p>
---
## 📑 Table of Contents
- [📚 Overview](#-overview)
- [🗂️ Project Structure](#-project-structure)
- [⚙️ Installation](#-installation)
- [🧾 Option 1 Poetry (Recommended)](#-option-1--poetry-recommended)
- [📦 Option 2 Pip + Requirements](#-option-2--pip--requirements)
- [▶️ Usage](#-usage)
- [🖥️ CLI Mode](#-cli-mode)
- [🌐 API Mode](#-api-mode)
- [📡 API Endpoints](#-api-endpoints)
- [🖥️ CLI Examples](#-cli-examples)
- [🧠 Architecture Overview](#-architecture-overview)
- [⚙️ Configuration](#-configuration)
- [🧩 Extending the Playground](#-extending-the-playground)
- [🧰 Troubleshooting](#-troubleshooting)
- [🧭 Development Guidelines](#-development-guidelines)
- [🧱 Roadmap](#-roadmap)
- [📜 License](#-license)
---
## 📚 Overview
The **AI Lab Transformers CLI Playground** enables users to explore **multiple NLP tasks directly from the terminal or via HTTP APIs**.
Each task (sentiment, NER, text generation, etc.) is implemented as a **Command Module** that communicates with a **Pipeline Module** powered by Hugging Faces `transformers` library.
The project demonstrates **clean ML code architecture** with strict separation between:
- Configuration
- Pipelines
- CLI logic
- Display formatting
Its a great educational resource for learning **how to structure ML applications** professionally.
---
## 🗂️ Project Structure
```text
src/
├── main.py # CLI entry point
├── cli/
│ ├── base.py # CLICommand base class & interactive shell
│ └── display.py # Console formatting utilities (colors, tables, results)
├── commands/ # User-facing commands wrapping pipeline logic
│ ├── sentiment.py # Sentiment analysis command
│ ├── fillmask.py # Masked token prediction
│ ├── textgen.py # Text generation
│ ├── ner.py # Named Entity Recognition
│ ├── qa.py # Question Answering (extractive)
│ └── moderation.py # Content moderation / toxicity detection
├── pipelines/ # ML logic based on Hugging Face pipelines
│ ├── template.py # Blueprint for creating new pipelines
│ ├── sentiment.py
│ ├── fillmask.py
│ ├── textgen.py
│ ├── ner.py
│ ├── qa.py
│ └── moderation.py
├── api/
│ ├── app.py # FastAPI app and endpoints
│ ├── models.py # Pydantic schemas
│ └── config.py # API configuration
└── config/
└── settings.py # Global configuration (models, params)
```
---
## ⚙️ Installation
### 🧾 Option 1 Poetry (Recommended)
> Poetry is the main dependency manager for this project.
```bash
poetry shell
poetry install
```
This installs all dependencies defined in `pyproject.toml` (including `transformers`, `torch`, and `fastapi`).
Run the app:
```bash
# CLI mode
poetry run python src/main.py --mode cli
# API mode
poetry run python src/main.py --mode api
```
---
### 📦 Option 2 Pip + requirements.txt
If you prefer manual dependency management:
```bash
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\Activate.ps1 # Windows
pip install -r requirements.txt
```
---
## ▶️ Usage
### 🖥️ CLI Mode
Run the interactive CLI:
```bash
python -m src.main --mode cli
```
Interactive menu:
```
Welcome to AI Lab - Transformers CLI Playground
Available commands:
• sentiment Analyze the sentiment of a text
• fillmask Predict masked words in a sentence
• textgen Generate text from a prompt
• ner Extract named entities from text
• qa Answer questions from a context
• moderation Detect toxic or unsafe content
```
---
### 🌐 API Mode
Run FastAPI server:
```bash
python -m src.main --mode api
# Custom config
python -m src.main --mode api --host 0.0.0.0 --port 8000 --reload
```
API Docs:
- **Swagger** → http://localhost:8000/docs
- **ReDoc** → http://localhost:8000/redoc
- **OpenAPI** → http://localhost:8000/openapi.json
---
## 📡 API Endpoints
### Core Endpoints
| Method | Endpoint | Description |
| ------ | --------- | ------------------------- |
| `GET` | `/` | Health check and API info |
| `GET` | `/health` | Detailed health status |
### Individual Processing
| Method | Endpoint | Description |
| ------ | ------------- | ---------------------- |
| `POST` | `/sentiment` | Analyze text sentiment |
| `POST` | `/fillmask` | Predict masked words |
| `POST` | `/textgen` | Generate text |
| `POST` | `/ner` | Extract named entities |
| `POST` | `/qa` | Question answering |
| `POST` | `/moderation` | Content moderation |
### Batch Processing
| Method | Endpoint | Description |
| ------ | ------------------- | -------------------------- |
| `POST` | `/sentiment/batch` | Process multiple texts |
| `POST` | `/fillmask/batch` | Fill multiple masked texts |
| `POST` | `/textgen/batch` | Generate from prompts |
| `POST` | `/ner/batch` | Extract entities in batch |
| `POST` | `/qa/batch` | Answer questions in batch |
| `POST` | `/moderation/batch` | Moderate multiple texts |
---
## 🖥️ CLI Examples
### 🔹 Sentiment Analysis
```text
💬 Enter text: I absolutely love this project!
→ Sentiment: POSITIVE (score: 0.998)
```
### 🔹 Fill-Mask
```text
💬 Enter text: The capital of France is [MASK].
→ Predictions:
1) Paris score: 0.87
2) Lyon score: 0.04
```
### 🔹 Text Generation
```text
💬 Prompt: Once upon a time
→ Output: Once upon a time there was a young AI learning to code...
```
### 🔹 NER
```text
💬 Enter text: Elon Musk founded SpaceX in California.
→ Entities:
- Elon Musk (PERSON)
- SpaceX (ORG)
- California (LOC)
```
### 🔹 QA (Question Answering)
```text
💬 Enter question: What is the capital of France?
💬 Enter context: France is a country in Europe. Its capital is Paris.
→ Answer: The capital of France is Paris.
```
### 🔹 Moderation
```text
💬 Enter text: I hate everything!
→ Result: FLAGGED (toxic content detected)
```
---
## 🧠 Architecture Overview
Both CLI and API share the **same pipeline layer**, ensuring code reusability and consistency.
### CLI Architecture
```text
InteractiveCLI → Command Layer → Pipeline Layer → Display Layer
```
### API Architecture
```text
FastAPI App → Pydantic Models → Pipeline Layer → JSON Response
```
| Layer | Description |
| ------------ | ---------------------------------------------- |
| **CLI** | Manages user input/output and navigation. |
| **API** | Exposes endpoints with automatic OpenAPI docs. |
| **Command** | Encapsulates user-facing operations. |
| **Pipeline** | Wraps Hugging Faces pipelines. |
| **Models** | Validates requests/responses. |
| **Display** | Formats console output. |
---
## ⚙️ Configuration
All configuration is centralized in `src/config/settings.py`:
```python
class Config:
DEFAULT_MODELS = {
"sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
"fillmask": "bert-base-uncased",
"textgen": "gpt2",
"ner": "dslim/bert-base-NER",
"qa": "distilbert-base-cased-distilled-squad",
"moderation":"unitary/toxic-bert",
}
MAX_LENGTH = 512
BATCH_SIZE = 8
```
---
## 🧩 Extending the Playground
To add a new NLP experiment (e.g., keyword extraction):
1. Duplicate `src/pipelines/template.py``src/pipelines/keywords.py`
2. Create a command: `src/commands/keywords.py`
3. Register it in `src/main.py`
4. Add Pydantic models and API endpoint
5. Update `Config.DEFAULT_MODELS`
Both CLI and API will automatically share this logic.
---
## 🧰 Troubleshooting
| Issue | Solution |
| ------------------------ | ----------------------- |
| `transformers` not found | Activate your venv. |
| Torch install fails | Use CPU-only wheel. |
| Models download slowly | Cached after first use. |
| Encoding issues | Ensure UTF-8 terminal. |
### API Issues
| Issue | Solution |
| -------------------- | --------------------------------------- |
| `FastAPI` missing | `pip install fastapi uvicorn[standard]` |
| Port in use | Change with `--port 8001` |
| CORS error | Edit `allow_origins` in `api/config.py` |
| Validation error 422 | Check request body |
| 500 error | Verify model loading |
---
## 🧭 Development Guidelines
- Keep command classes lightweight (no ML inside)
- Use the pipeline template for new tasks
- Format all outputs via `DisplayFormatter`
- Document new commands and models
---
## 🧱 Roadmap
- [ ] Non-interactive CLI flags (`--text`, `--task`)
- [ ] Multilingual models
- [ ] Test coverage
- [ ] Logging & profiling
- [ ] Export to JSON/CSV
---
## 📜 License
Licensed under the [MIT License](./LICENSE).
You are free to use, modify, and distribute this project.
---
**End of Documentation**
_The AI Lab Transformers CLI Playground: built for learning, experimenting, and sharing NLP excellence._