315 lines
8.8 KiB
Markdown
315 lines
8.8 KiB
Markdown
# 🧠 AI Lab – Transformers CLI Playground
|
||
|
||
> A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive Command‑Line Interface (CLI)**.
|
||
> This playground provides ready‑to‑use NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, Fill‑Mask, Moderation, etc.) in a modular, extensible, and educational codebase.
|
||
|
||
---
|
||
|
||
## 📚 Overview
|
||
|
||
The **AI Lab – Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.
|
||
Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.
|
||
|
||
The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.
|
||
|
||
---
|
||
|
||
## 🗂️ Project Structure
|
||
|
||
```text
|
||
src/
|
||
├── __init__.py
|
||
├── main.py # CLI entry point
|
||
│
|
||
├── cli/
|
||
│ ├── __init__.py
|
||
│ ├── base.py # CLICommand base class & interactive shell handler
|
||
│ └── display.py # Console formatting utilities (tables, colors, results)
|
||
│
|
||
├── commands/ # User-facing commands wrapping pipeline logic
|
||
│ ├── __init__.py
|
||
│ ├── sentiment.py # Sentiment analysis command
|
||
│ ├── fillmask.py # Masked token prediction command
|
||
│ ├── textgen.py # Text generation command
|
||
│ ├── ner.py # Named Entity Recognition command
|
||
│ └── moderation.py # Toxicity / content moderation command
|
||
│
|
||
├── pipelines/ # Machine learning logic (Hugging Face Transformers)
|
||
│ ├── __init__.py
|
||
│ ├── template.py # Blueprint for creating new pipelines
|
||
│ ├── sentiment.py
|
||
│ ├── fillmask.py
|
||
│ ├── textgen.py
|
||
│ ├── ner.py
|
||
│ └── moderation.py
|
||
│
|
||
└── config/
|
||
├── __init__.py
|
||
└── settings.py # Global configuration (default models, parameters)
|
||
```
|
||
|
||
---
|
||
|
||
## ⚙️ Installation
|
||
|
||
### 🧾 Option 1 – Using Poetry (Recommended)
|
||
|
||
> Poetry is used as the main dependency manager.
|
||
|
||
```bash
|
||
# 1. Create and activate a new virtual environment
|
||
poetry shell
|
||
|
||
# 2. Install dependencies
|
||
poetry install
|
||
```
|
||
|
||
This will automatically install all dependencies declared in `pyproject.toml`, including **transformers** and **torch**.
|
||
|
||
To run the CLI inside the Poetry environment:
|
||
```bash
|
||
poetry run python src/main.py
|
||
```
|
||
|
||
---
|
||
|
||
### 📦 Option 2 – Using pip and requirements.txt
|
||
|
||
If you prefer using `requirements.txt` manually:
|
||
|
||
```bash
|
||
# 1. Create a virtual environment
|
||
python -m venv .venv
|
||
|
||
# 2. Activate it
|
||
# Linux/macOS
|
||
source .venv/bin/activate
|
||
# Windows PowerShell
|
||
.venv\Scripts\Activate.ps1
|
||
|
||
# 3. Install dependencies
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
---
|
||
|
||
## ▶️ Usage
|
||
|
||
Once installed, launch the CLI with:
|
||
|
||
```bash
|
||
python -m src.main
|
||
# or, if using Poetry
|
||
poetry run python src/main.py
|
||
```
|
||
|
||
You’ll see an interactive menu listing the available commands:
|
||
|
||
```
|
||
Welcome to AI Lab - Transformers CLI Playground
|
||
Available commands:
|
||
• sentiment – Analyze the sentiment of a text
|
||
• fillmask – Predict masked words in a sentence
|
||
• textgen – Generate text from a prompt
|
||
• ner – Extract named entities from text
|
||
• moderation – Detect toxic or unsafe content
|
||
```
|
||
|
||
### Example Sessions
|
||
|
||
#### 🔹 Sentiment Analysis
|
||
```text
|
||
💬 Enter text: I absolutely love this project!
|
||
→ Sentiment: POSITIVE (score: 0.998)
|
||
```
|
||
|
||
#### 🔹 Fill‑Mask
|
||
```text
|
||
💬 Enter text: The capital of France is [MASK].
|
||
→ Predictions:
|
||
1) Paris score: 0.87
|
||
2) Lyon score: 0.04
|
||
3) London score: 0.02
|
||
```
|
||
|
||
#### 🔹 Text Generation
|
||
```text
|
||
💬 Prompt: Once upon a time
|
||
→ Output: Once upon a time there was a young AI learning to code...
|
||
```
|
||
|
||
#### 🔹 NER (Named Entity Recognition)
|
||
```text
|
||
💬 Enter text: Elon Musk founded SpaceX in California.
|
||
→ Entities:
|
||
- Elon Musk (PERSON)
|
||
- SpaceX (ORG)
|
||
- California (LOC)
|
||
```
|
||
|
||
#### 🔹 Moderation
|
||
```text
|
||
💬 Enter text: I hate everything!
|
||
→ Result: FLAGGED (toxic content detected)
|
||
```
|
||
|
||
---
|
||
|
||
## 🧠 Architecture Overview
|
||
|
||
The internal structure follows a clean **Command ↔ Pipeline ↔ Display** pattern:
|
||
|
||
```text
|
||
┌──────────────────────┐
|
||
│ InteractiveCLI │
|
||
│ (src/cli/base.py) │
|
||
└──────────┬───────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Command Layer │ ← e.g. sentiment.py
|
||
│ (user commands) │
|
||
└───────┬─────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Pipeline Layer │ ← e.g. pipelines/sentiment.py
|
||
│ (ML logic) │
|
||
└───────┬─────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ Display Layer │ ← cli/display.py
|
||
│ (format output) │
|
||
└─────────────────┘
|
||
```
|
||
|
||
### Key Concepts
|
||
|
||
| Layer | Description |
|
||
|-------|--------------|
|
||
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
|
||
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
|
||
| **Pipeline** | Wraps Hugging Face’s `transformers.pipeline()` to perform inference. |
|
||
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
|
||
| **Config** | Centralizes model names, limits, and global constants. |
|
||
|
||
---
|
||
|
||
## ⚙️ Configuration
|
||
|
||
All configuration is centralized in `src/config/settings.py`.
|
||
|
||
Example:
|
||
|
||
```python
|
||
class Config:
|
||
DEFAULT_MODELS = {
|
||
"sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
|
||
"fillmask": "bert-base-uncased",
|
||
"textgen": "gpt2",
|
||
"ner": "dslim/bert-base-NER",
|
||
"moderation":"unitary/toxic-bert"
|
||
}
|
||
MAX_LENGTH = 512
|
||
BATCH_SIZE = 8
|
||
```
|
||
|
||
You can easily modify model names to experiment with different checkpoints.
|
||
|
||
---
|
||
|
||
## 🧩 Extending the Playground
|
||
|
||
To create a new experiment (e.g., keyword extraction):
|
||
|
||
1. **Duplicate** `src/pipelines/template.py` → `src/pipelines/keywords.py`
|
||
Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.
|
||
|
||
2. **Create a Command** in `src/commands/keywords.py` to interact with users.
|
||
|
||
3. **Register the command** inside `src/main.py`:
|
||
|
||
```python
|
||
from src.commands.keywords import KeywordsCommand
|
||
cli.register_command(KeywordsCommand())
|
||
```
|
||
|
||
4. Optionally, add a model name in `Config.DEFAULT_MODELS`.
|
||
|
||
---
|
||
|
||
## 🧪 Testing
|
||
|
||
You can use `pytest` for lightweight validation:
|
||
|
||
```bash
|
||
pip install pytest
|
||
pytest -q
|
||
```
|
||
|
||
Recommended structure:
|
||
|
||
```
|
||
tests/
|
||
├── test_sentiment.py
|
||
├── test_textgen.py
|
||
└── ...
|
||
```
|
||
|
||
---
|
||
|
||
## 🧰 Troubleshooting
|
||
|
||
| Issue | Cause / Solution |
|
||
|-------|------------------|
|
||
| **`transformers` not found** | Check virtual environment activation. |
|
||
| **Torch fails to install** | Install CPU-only version from PyTorch index. |
|
||
| **Models download slowly** | Hugging Face caches them after first run. |
|
||
| **Unicode / accents broken** | Ensure terminal encoding is UTF‑8. |
|
||
|
||
---
|
||
|
||
## 🧭 Development Guidelines
|
||
|
||
- Keep **Command** classes lightweight — no ML logic inside them.
|
||
- Reuse the **Pipeline Template** for new experiments.
|
||
- Format outputs consistently via the `DisplayFormatter`.
|
||
- Document all new models or commands in `README.md` and `settings.py`.
|
||
|
||
---
|
||
|
||
## 🧱 Roadmap
|
||
|
||
- [ ] Add non-interactive CLI flags (`--text`, `--task`)
|
||
- [ ] Add multilingual model options
|
||
- [ ] Add automatic test coverage
|
||
- [ ] Add logging and profiling utilities
|
||
- [ ] Add export to JSON/CSV results
|
||
|
||
---
|
||
|
||
## 🪪 License
|
||
|
||
You can include a standard open-source license such as **MIT** or **Apache 2.0** depending on your use case.
|
||
|
||
---
|
||
|
||
## 🤝 Contributing
|
||
|
||
This repository is meant as an **educational sandbox** for experimenting with Transformers.
|
||
Pull requests are welcome for new models, better CLI UX, or educational improvements.
|
||
|
||
---
|
||
|
||
### ✨ Key Takeaways
|
||
|
||
- Modular and pedagogical design for training environments
|
||
- Clean separation between **I/O**, **ML logic**, and **UX**
|
||
- Easily extensible architecture for adding custom pipelines
|
||
- Perfect sandbox for students, researchers, and developers to learn modern NLP tools
|
||
|
||
---
|
||
|
||
> 🧩 Built for experimentation. Learn, break, and rebuild.
|