370 lines
10 KiB
Markdown
370 lines
10 KiB
Markdown
# 🧠 AI Lab – Transformers CLI Playground
|
||
|
||
> A **pedagogical and technical project** designed for AI practitioners and students to explore **Hugging Face Transformers** through an **interactive Command-Line Interface (CLI)** or a **REST API**.
|
||
> This playground provides ready-to-use NLP pipelines — including **Sentiment Analysis**, **Named Entity Recognition**, **Text Generation**, **Fill-Mask**, **Question Answering (QA)**, **Moderation**, and more — in a modular, extensible, and educational codebase.
|
||
|
||
---
|
||
|
||
<p align="center">
|
||
<img src="https://img.shields.io/badge/Python-3.13-blue.svg" alt="Python"/>
|
||
<img src="https://img.shields.io/badge/Built_with-Poetry-purple.svg" alt="Poetry"/>
|
||
<img src="https://img.shields.io/badge/🤗-Transformers-orange.svg" alt="Transformers"/>
|
||
<img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License"/>
|
||
</p>
|
||
|
||
---
|
||
|
||
## 📑 Table of Contents
|
||
|
||
- [📚 Overview](#-overview)
|
||
- [🗂️ Project Structure](#️-project-structure)
|
||
- [⚙️ Installation](#️-installation)
|
||
- [🧾 Option 1 – Poetry (Recommended)](#-option-1--poetry-recommended)
|
||
- [📦 Option 2 – Pip + Requirements](#-option-2--pip--requirements)
|
||
- [▶️ Usage](#️-usage)
|
||
- [🖥️ CLI Mode](#️-cli-mode)
|
||
- [🌐 API Mode](#-api-mode)
|
||
- [📡 API Endpoints](#-api-endpoints)
|
||
- [🖥️ CLI Examples](#️-cli-examples)
|
||
- [🧠 Architecture Overview](#-architecture-overview)
|
||
- [⚙️ Configuration](#️-configuration)
|
||
- [🧩 Extending the Playground](#-extending-the-playground)
|
||
- [🧰 Troubleshooting](#-troubleshooting)
|
||
- [🧭 Development Guidelines](#-development-guidelines)
|
||
- [🧱 Roadmap](#-roadmap)
|
||
- [📜 License](#-license)
|
||
|
||
---
|
||
|
||
## 📚 Overview
|
||
|
||
The **AI Lab – Transformers CLI Playground** enables users to explore **multiple NLP tasks directly from the terminal or via HTTP APIs**.
|
||
Each task (sentiment, NER, text generation, etc.) is implemented as a **Command Module** that communicates with a **Pipeline Module** powered by Hugging Face’s `transformers` library.
|
||
|
||
The project demonstrates **clean ML code architecture** with strict separation between:
|
||
|
||
- Configuration
|
||
- Pipelines
|
||
- CLI logic
|
||
- Display formatting
|
||
|
||
It’s a great educational resource for learning **how to structure ML applications** professionally.
|
||
|
||
---
|
||
|
||
## 🗂️ Project Structure
|
||
|
||
```text
|
||
src/
|
||
├── main.py # CLI entry point
|
||
│
|
||
├── cli/
|
||
│ ├── base.py # CLICommand base class & interactive shell
|
||
│ └── display.py # Console formatting utilities (colors, tables, results)
|
||
│
|
||
├── commands/ # User-facing commands wrapping pipeline logic
|
||
│ ├── sentiment.py # Sentiment analysis command
|
||
│ ├── fillmask.py # Masked token prediction
|
||
│ ├── textgen.py # Text generation
|
||
│ ├── ner.py # Named Entity Recognition
|
||
│ ├── qa.py # Question Answering (extractive)
|
||
│ └── moderation.py # Content moderation / toxicity detection
|
||
│
|
||
├── pipelines/ # ML logic based on Hugging Face pipelines
|
||
│ ├── template.py # Blueprint for creating new pipelines
|
||
│ ├── sentiment.py
|
||
│ ├── fillmask.py
|
||
│ ├── textgen.py
|
||
│ ├── ner.py
|
||
│ ├── qa.py
|
||
│ └── moderation.py
|
||
│
|
||
├── api/
|
||
│ ├── app.py # FastAPI app and endpoints
|
||
│ ├── models.py # Pydantic schemas
|
||
│ └── config.py # API configuration
|
||
│
|
||
└── config/
|
||
└── settings.py # Global configuration (models, params)
|
||
```
|
||
|
||
---
|
||
|
||
## ⚙️ Installation
|
||
|
||
### 🧾 Option 1 – Poetry (Recommended)
|
||
|
||
> Poetry is the main dependency manager for this project.
|
||
|
||
```bash
|
||
poetry shell
|
||
poetry install
|
||
```
|
||
|
||
This installs all dependencies defined in `pyproject.toml` (including `transformers`, `torch`, and `fastapi`).
|
||
|
||
Run the app:
|
||
|
||
```bash
|
||
# CLI mode
|
||
poetry run python src/main.py --mode cli
|
||
|
||
# API mode
|
||
poetry run python src/main.py --mode api
|
||
```
|
||
|
||
---
|
||
|
||
### 📦 Option 2 – Pip + requirements.txt
|
||
|
||
If you prefer manual dependency management:
|
||
|
||
```bash
|
||
python -m venv .venv
|
||
source .venv/bin/activate # Linux/macOS
|
||
.venv\Scripts\Activate.ps1 # Windows
|
||
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
---
|
||
|
||
## ▶️ Usage
|
||
|
||
### 🖥️ CLI Mode
|
||
|
||
Run the interactive CLI:
|
||
|
||
```bash
|
||
python -m src.main --mode cli
|
||
```
|
||
|
||
Interactive menu:
|
||
|
||
```
|
||
Welcome to AI Lab - Transformers CLI Playground
|
||
Available commands:
|
||
• sentiment – Analyze the sentiment of a text
|
||
• fillmask – Predict masked words in a sentence
|
||
• textgen – Generate text from a prompt
|
||
• ner – Extract named entities from text
|
||
• qa – Answer questions from a context
|
||
• moderation – Detect toxic or unsafe content
|
||
```
|
||
|
||
---
|
||
|
||
### 🌐 API Mode
|
||
|
||
Run FastAPI server:
|
||
|
||
```bash
|
||
python -m src.main --mode api
|
||
# Custom config
|
||
python -m src.main --mode api --host 0.0.0.0 --port 8000 --reload
|
||
```
|
||
|
||
API Docs:
|
||
|
||
- **Swagger** → http://localhost:8000/docs
|
||
- **ReDoc** → http://localhost:8000/redoc
|
||
- **OpenAPI** → http://localhost:8000/openapi.json
|
||
|
||
---
|
||
|
||
## 📡 API Endpoints
|
||
|
||
### Core Endpoints
|
||
|
||
| Method | Endpoint | Description |
|
||
| ------ | --------- | ------------------------- |
|
||
| `GET` | `/` | Health check and API info |
|
||
| `GET` | `/health` | Detailed health status |
|
||
|
||
### Individual Processing
|
||
|
||
| Method | Endpoint | Description |
|
||
| ------ | ------------- | ---------------------- |
|
||
| `POST` | `/sentiment` | Analyze text sentiment |
|
||
| `POST` | `/fillmask` | Predict masked words |
|
||
| `POST` | `/textgen` | Generate text |
|
||
| `POST` | `/ner` | Extract named entities |
|
||
| `POST` | `/qa` | Question answering |
|
||
| `POST` | `/moderation` | Content moderation |
|
||
|
||
### Batch Processing
|
||
|
||
| Method | Endpoint | Description |
|
||
| ------ | ------------------- | -------------------------- |
|
||
| `POST` | `/sentiment/batch` | Process multiple texts |
|
||
| `POST` | `/fillmask/batch` | Fill multiple masked texts |
|
||
| `POST` | `/textgen/batch` | Generate from prompts |
|
||
| `POST` | `/ner/batch` | Extract entities in batch |
|
||
| `POST` | `/qa/batch` | Answer questions in batch |
|
||
| `POST` | `/moderation/batch` | Moderate multiple texts |
|
||
|
||
---
|
||
|
||
## 🖥️ CLI Examples
|
||
|
||
### 🔹 Sentiment Analysis
|
||
|
||
```text
|
||
💬 Enter text: I absolutely love this project!
|
||
→ Sentiment: POSITIVE (score: 0.998)
|
||
```
|
||
|
||
### 🔹 Fill-Mask
|
||
|
||
```text
|
||
💬 Enter text: The capital of France is [MASK].
|
||
→ Predictions:
|
||
1) Paris score: 0.87
|
||
2) Lyon score: 0.04
|
||
```
|
||
|
||
### 🔹 Text Generation
|
||
|
||
```text
|
||
💬 Prompt: Once upon a time
|
||
→ Output: Once upon a time there was a young AI learning to code...
|
||
```
|
||
|
||
### 🔹 NER
|
||
|
||
```text
|
||
💬 Enter text: Elon Musk founded SpaceX in California.
|
||
→ Entities:
|
||
- Elon Musk (PERSON)
|
||
- SpaceX (ORG)
|
||
- California (LOC)
|
||
```
|
||
|
||
### 🔹 QA (Question Answering)
|
||
|
||
```text
|
||
💬 Enter question: What is the capital of France?
|
||
💬 Enter context: France is a country in Europe. Its capital is Paris.
|
||
→ Answer: The capital of France is Paris.
|
||
```
|
||
|
||
### 🔹 Moderation
|
||
|
||
```text
|
||
💬 Enter text: I hate everything!
|
||
→ Result: FLAGGED (toxic content detected)
|
||
```
|
||
|
||
---
|
||
|
||
## 🧠 Architecture Overview
|
||
|
||
Both CLI and API share the **same pipeline layer**, ensuring code reusability and consistency.
|
||
|
||
### CLI Architecture
|
||
|
||
```text
|
||
InteractiveCLI → Command Layer → Pipeline Layer → Display Layer
|
||
```
|
||
|
||
### API Architecture
|
||
|
||
```text
|
||
FastAPI App → Pydantic Models → Pipeline Layer → JSON Response
|
||
```
|
||
|
||
| Layer | Description |
|
||
| ------------ | ---------------------------------------------- |
|
||
| **CLI** | Manages user input/output and navigation. |
|
||
| **API** | Exposes endpoints with automatic OpenAPI docs. |
|
||
| **Command** | Encapsulates user-facing operations. |
|
||
| **Pipeline** | Wraps Hugging Face’s pipelines. |
|
||
| **Models** | Validates requests/responses. |
|
||
| **Display** | Formats console output. |
|
||
|
||
---
|
||
|
||
## ⚙️ Configuration
|
||
|
||
All configuration is centralized in `src/config/settings.py`:
|
||
|
||
```python
|
||
class Config:
|
||
DEFAULT_MODELS = {
|
||
"sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
|
||
"fillmask": "bert-base-uncased",
|
||
"textgen": "gpt2",
|
||
"ner": "dslim/bert-base-NER",
|
||
"qa": "distilbert-base-cased-distilled-squad",
|
||
"moderation":"unitary/toxic-bert",
|
||
}
|
||
MAX_LENGTH = 512
|
||
BATCH_SIZE = 8
|
||
```
|
||
|
||
---
|
||
|
||
## 🧩 Extending the Playground
|
||
|
||
To add a new NLP experiment (e.g., keyword extraction):
|
||
|
||
1. Duplicate `src/pipelines/template.py` → `src/pipelines/keywords.py`
|
||
2. Create a command: `src/commands/keywords.py`
|
||
3. Register it in `src/main.py`
|
||
4. Add Pydantic models and API endpoint
|
||
5. Update `Config.DEFAULT_MODELS`
|
||
|
||
Both CLI and API will automatically share this logic.
|
||
|
||
---
|
||
|
||
## 🧰 Troubleshooting
|
||
|
||
| Issue | Solution |
|
||
| ------------------------ | ----------------------- |
|
||
| `transformers` not found | Activate your venv. |
|
||
| Torch install fails | Use CPU-only wheel. |
|
||
| Models download slowly | Cached after first use. |
|
||
| Encoding issues | Ensure UTF-8 terminal. |
|
||
|
||
### API Issues
|
||
|
||
| Issue | Solution |
|
||
| -------------------- | --------------------------------------- |
|
||
| `FastAPI` missing | `pip install fastapi uvicorn[standard]` |
|
||
| Port in use | Change with `--port 8001` |
|
||
| CORS error | Edit `allow_origins` in `api/config.py` |
|
||
| Validation error 422 | Check request body |
|
||
| 500 error | Verify model loading |
|
||
|
||
---
|
||
|
||
## 🧭 Development Guidelines
|
||
|
||
- Keep command classes lightweight (no ML inside)
|
||
- Use the pipeline template for new tasks
|
||
- Format all outputs via `DisplayFormatter`
|
||
- Document new commands and models
|
||
|
||
---
|
||
|
||
## 🧱 Roadmap
|
||
|
||
- [ ] Non-interactive CLI flags (`--text`, `--task`)
|
||
- [ ] Multilingual models
|
||
- [ ] Test coverage
|
||
- [ ] Logging & profiling
|
||
- [ ] Export to JSON/CSV
|
||
|
||
---
|
||
|
||
## 📜 License
|
||
|
||
Licensed under the [MIT License](./LICENSE).
|
||
You are free to use, modify, and distribute this project.
|
||
|
||
---
|
||
|
||
✨ **End of Documentation**
|
||
_The AI Lab – Transformers CLI Playground: built for learning, experimenting, and sharing NLP excellence._
|