Init commit
This commit is contained in:
commit
9b2a5497d9
|
|
@ -0,0 +1,79 @@
|
||||||
|
# Byte-compiled / optimized / DLL files
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
|
||||||
|
# C extensions
|
||||||
|
*.so
|
||||||
|
|
||||||
|
# Distribution / packaging
|
||||||
|
.Python
|
||||||
|
build/
|
||||||
|
develop-eggs/
|
||||||
|
dist/
|
||||||
|
downloads/
|
||||||
|
eggs/
|
||||||
|
.eggs/
|
||||||
|
lib/
|
||||||
|
lib64/
|
||||||
|
parts/
|
||||||
|
sdist/
|
||||||
|
var/
|
||||||
|
*.egg-info/
|
||||||
|
.installed.cfg
|
||||||
|
*.egg
|
||||||
|
|
||||||
|
# Virtual environments
|
||||||
|
venv/
|
||||||
|
ENV/
|
||||||
|
env/
|
||||||
|
.venv/
|
||||||
|
.env/
|
||||||
|
|
||||||
|
# PyInstaller
|
||||||
|
*.manifest
|
||||||
|
*.spec
|
||||||
|
|
||||||
|
# Unit test / coverage reports
|
||||||
|
htmlcov/
|
||||||
|
.tox/
|
||||||
|
.nox/
|
||||||
|
.coverage
|
||||||
|
.coverage.*
|
||||||
|
.cache
|
||||||
|
nosetests.xml
|
||||||
|
coverage.xml
|
||||||
|
*.cover
|
||||||
|
*.py,cover
|
||||||
|
|
||||||
|
# Jupyter Notebook
|
||||||
|
.ipynb_checkpoints
|
||||||
|
|
||||||
|
# pyenv
|
||||||
|
.python-version
|
||||||
|
|
||||||
|
# mypy
|
||||||
|
.mypy_cache/
|
||||||
|
.dmypy.json
|
||||||
|
dmypy.json
|
||||||
|
|
||||||
|
# VS Code
|
||||||
|
.vscode/
|
||||||
|
|
||||||
|
# macOS
|
||||||
|
.DS_Store
|
||||||
|
|
||||||
|
# Logs
|
||||||
|
*.log
|
||||||
|
|
||||||
|
# dotenv
|
||||||
|
.env
|
||||||
|
.env.*
|
||||||
|
|
||||||
|
# Local settings
|
||||||
|
local_settings.py
|
||||||
|
|
||||||
|
# System files
|
||||||
|
Thumbs.db
|
||||||
|
ehthumbs.db
|
||||||
|
Desktop.ini
|
||||||
|
|
@ -0,0 +1,314 @@
|
||||||
|
# 🧠 AI Lab – Transformers CLI Playground
|
||||||
|
|
||||||
|
> A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive Command‑Line Interface (CLI)**.
|
||||||
|
> This playground provides ready‑to‑use NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, Fill‑Mask, Moderation, etc.) in a modular, extensible, and educational codebase.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 Overview
|
||||||
|
|
||||||
|
The **AI Lab – Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.
|
||||||
|
Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.
|
||||||
|
|
||||||
|
The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🗂️ Project Structure
|
||||||
|
|
||||||
|
```text
|
||||||
|
src/
|
||||||
|
├── __init__.py
|
||||||
|
├── main.py # CLI entry point
|
||||||
|
│
|
||||||
|
├── cli/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── base.py # CLICommand base class & interactive shell handler
|
||||||
|
│ └── display.py # Console formatting utilities (tables, colors, results)
|
||||||
|
│
|
||||||
|
├── commands/ # User-facing commands wrapping pipeline logic
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── sentiment.py # Sentiment analysis command
|
||||||
|
│ ├── fillmask.py # Masked token prediction command
|
||||||
|
│ ├── textgen.py # Text generation command
|
||||||
|
│ ├── ner.py # Named Entity Recognition command
|
||||||
|
│ └── moderation.py # Toxicity / content moderation command
|
||||||
|
│
|
||||||
|
├── pipelines/ # Machine learning logic (Hugging Face Transformers)
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── template.py # Blueprint for creating new pipelines
|
||||||
|
│ ├── sentiment.py
|
||||||
|
│ ├── fillmask.py
|
||||||
|
│ ├── textgen.py
|
||||||
|
│ ├── ner.py
|
||||||
|
│ └── moderation.py
|
||||||
|
│
|
||||||
|
└── config/
|
||||||
|
├── __init__.py
|
||||||
|
└── settings.py # Global configuration (default models, parameters)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Installation
|
||||||
|
|
||||||
|
### 🧾 Option 1 – Using Poetry (Recommended)
|
||||||
|
|
||||||
|
> Poetry is used as the main dependency manager.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Create and activate a new virtual environment
|
||||||
|
poetry shell
|
||||||
|
|
||||||
|
# 2. Install dependencies
|
||||||
|
poetry install
|
||||||
|
```
|
||||||
|
|
||||||
|
This will automatically install all dependencies declared in `pyproject.toml`, including **transformers** and **torch**.
|
||||||
|
|
||||||
|
To run the CLI inside the Poetry environment:
|
||||||
|
```bash
|
||||||
|
poetry run python src/main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 📦 Option 2 – Using pip and requirements.txt
|
||||||
|
|
||||||
|
If you prefer using `requirements.txt` manually:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Create a virtual environment
|
||||||
|
python -m venv .venv
|
||||||
|
|
||||||
|
# 2. Activate it
|
||||||
|
# Linux/macOS
|
||||||
|
source .venv/bin/activate
|
||||||
|
# Windows PowerShell
|
||||||
|
.venv\Scripts\Activate.ps1
|
||||||
|
|
||||||
|
# 3. Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ▶️ Usage
|
||||||
|
|
||||||
|
Once installed, launch the CLI with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m src.main
|
||||||
|
# or, if using Poetry
|
||||||
|
poetry run python src/main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
You’ll see an interactive menu listing the available commands:
|
||||||
|
|
||||||
|
```
|
||||||
|
Welcome to AI Lab - Transformers CLI Playground
|
||||||
|
Available commands:
|
||||||
|
• sentiment – Analyze the sentiment of a text
|
||||||
|
• fillmask – Predict masked words in a sentence
|
||||||
|
• textgen – Generate text from a prompt
|
||||||
|
• ner – Extract named entities from text
|
||||||
|
• moderation – Detect toxic or unsafe content
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example Sessions
|
||||||
|
|
||||||
|
#### 🔹 Sentiment Analysis
|
||||||
|
```text
|
||||||
|
💬 Enter text: I absolutely love this project!
|
||||||
|
→ Sentiment: POSITIVE (score: 0.998)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 🔹 Fill‑Mask
|
||||||
|
```text
|
||||||
|
💬 Enter text: The capital of France is [MASK].
|
||||||
|
→ Predictions:
|
||||||
|
1) Paris score: 0.87
|
||||||
|
2) Lyon score: 0.04
|
||||||
|
3) London score: 0.02
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 🔹 Text Generation
|
||||||
|
```text
|
||||||
|
💬 Prompt: Once upon a time
|
||||||
|
→ Output: Once upon a time there was a young AI learning to code...
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 🔹 NER (Named Entity Recognition)
|
||||||
|
```text
|
||||||
|
💬 Enter text: Elon Musk founded SpaceX in California.
|
||||||
|
→ Entities:
|
||||||
|
- Elon Musk (PERSON)
|
||||||
|
- SpaceX (ORG)
|
||||||
|
- California (LOC)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 🔹 Moderation
|
||||||
|
```text
|
||||||
|
💬 Enter text: I hate everything!
|
||||||
|
→ Result: FLAGGED (toxic content detected)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧠 Architecture Overview
|
||||||
|
|
||||||
|
The internal structure follows a clean **Command ↔ Pipeline ↔ Display** pattern:
|
||||||
|
|
||||||
|
```text
|
||||||
|
┌──────────────────────┐
|
||||||
|
│ InteractiveCLI │
|
||||||
|
│ (src/cli/base.py) │
|
||||||
|
└──────────┬───────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Command Layer │ ← e.g. sentiment.py
|
||||||
|
│ (user commands) │
|
||||||
|
└───────┬─────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Pipeline Layer │ ← e.g. pipelines/sentiment.py
|
||||||
|
│ (ML logic) │
|
||||||
|
└───────┬─────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Display Layer │ ← cli/display.py
|
||||||
|
│ (format output) │
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Concepts
|
||||||
|
|
||||||
|
| Layer | Description |
|
||||||
|
|-------|--------------|
|
||||||
|
| **CLI** | Manages user input/output, help menus, and navigation between commands. |
|
||||||
|
| **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
|
||||||
|
| **Pipeline** | Wraps Hugging Face’s `transformers.pipeline()` to perform inference. |
|
||||||
|
| **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
|
||||||
|
| **Config** | Centralizes model names, limits, and global constants. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Configuration
|
||||||
|
|
||||||
|
All configuration is centralized in `src/config/settings.py`.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Config:
|
||||||
|
DEFAULT_MODELS = {
|
||||||
|
"sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
|
||||||
|
"fillmask": "bert-base-uncased",
|
||||||
|
"textgen": "gpt2",
|
||||||
|
"ner": "dslim/bert-base-NER",
|
||||||
|
"moderation":"unitary/toxic-bert"
|
||||||
|
}
|
||||||
|
MAX_LENGTH = 512
|
||||||
|
BATCH_SIZE = 8
|
||||||
|
```
|
||||||
|
|
||||||
|
You can easily modify model names to experiment with different checkpoints.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧩 Extending the Playground
|
||||||
|
|
||||||
|
To create a new experiment (e.g., keyword extraction):
|
||||||
|
|
||||||
|
1. **Duplicate** `src/pipelines/template.py` → `src/pipelines/keywords.py`
|
||||||
|
Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.
|
||||||
|
|
||||||
|
2. **Create a Command** in `src/commands/keywords.py` to interact with users.
|
||||||
|
|
||||||
|
3. **Register the command** inside `src/main.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from src.commands.keywords import KeywordsCommand
|
||||||
|
cli.register_command(KeywordsCommand())
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Optionally, add a model name in `Config.DEFAULT_MODELS`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧪 Testing
|
||||||
|
|
||||||
|
You can use `pytest` for lightweight validation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install pytest
|
||||||
|
pytest -q
|
||||||
|
```
|
||||||
|
|
||||||
|
Recommended structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
tests/
|
||||||
|
├── test_sentiment.py
|
||||||
|
├── test_textgen.py
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧰 Troubleshooting
|
||||||
|
|
||||||
|
| Issue | Cause / Solution |
|
||||||
|
|-------|------------------|
|
||||||
|
| **`transformers` not found** | Check virtual environment activation. |
|
||||||
|
| **Torch fails to install** | Install CPU-only version from PyTorch index. |
|
||||||
|
| **Models download slowly** | Hugging Face caches them after first run. |
|
||||||
|
| **Unicode / accents broken** | Ensure terminal encoding is UTF‑8. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧭 Development Guidelines
|
||||||
|
|
||||||
|
- Keep **Command** classes lightweight — no ML logic inside them.
|
||||||
|
- Reuse the **Pipeline Template** for new experiments.
|
||||||
|
- Format outputs consistently via the `DisplayFormatter`.
|
||||||
|
- Document all new models or commands in `README.md` and `settings.py`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🧱 Roadmap
|
||||||
|
|
||||||
|
- [ ] Add non-interactive CLI flags (`--text`, `--task`)
|
||||||
|
- [ ] Add multilingual model options
|
||||||
|
- [ ] Add automatic test coverage
|
||||||
|
- [ ] Add logging and profiling utilities
|
||||||
|
- [ ] Add export to JSON/CSV results
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🪪 License
|
||||||
|
|
||||||
|
You can include a standard open-source license such as **MIT** or **Apache 2.0** depending on your use case.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🤝 Contributing
|
||||||
|
|
||||||
|
This repository is meant as an **educational sandbox** for experimenting with Transformers.
|
||||||
|
Pull requests are welcome for new models, better CLI UX, or educational improvements.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ✨ Key Takeaways
|
||||||
|
|
||||||
|
- Modular and pedagogical design for training environments
|
||||||
|
- Clean separation between **I/O**, **ML logic**, and **UX**
|
||||||
|
- Easily extensible architecture for adding custom pipelines
|
||||||
|
- Perfect sandbox for students, researchers, and developers to learn modern NLP tools
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
> 🧩 Built for experimentation. Learn, break, and rebuild.
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,27 @@
|
||||||
|
[project]
|
||||||
|
name = "ai-lab"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Lab for testing different uses of transformers"
|
||||||
|
authors = [{ name = "Cyril", email = "decostanzicyril@gmail.com" }]
|
||||||
|
|
||||||
|
[tool.poetry]
|
||||||
|
name = "ai-lab"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Lab for testing different uses of transformers"
|
||||||
|
authors = ["Cyril"]
|
||||||
|
packages = [{ include = "src" }]
|
||||||
|
|
||||||
|
[tool.poetry.dependencies]
|
||||||
|
python = ">=3.12,<3.14"
|
||||||
|
torch = "^2.0.0"
|
||||||
|
transformers = "^4.30.0"
|
||||||
|
tokenizers = "^0.13.0"
|
||||||
|
numpy = "^1.24.0"
|
||||||
|
accelerate = "^0.20.0"
|
||||||
|
|
||||||
|
[tool.poetry.scripts]
|
||||||
|
ai-lab = "src.main:main"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["poetry-core"]
|
||||||
|
build-backend = "poetry.core.masonry.api"
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
torch>=2.0.0
|
||||||
|
transformers>=4.30.0
|
||||||
|
tokenizers>=0.13.0
|
||||||
|
numpy>=1.24.0
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
"""
|
||||||
|
AI Lab - Transformers Experimentation
|
||||||
|
"""
|
||||||
|
__version__ = "0.1.0"
|
||||||
|
|
@ -0,0 +1,7 @@
|
||||||
|
"""
|
||||||
|
CLI utilities for AI Lab
|
||||||
|
"""
|
||||||
|
from .base import CLICommand, InteractiveCLI
|
||||||
|
from .display import DisplayFormatter
|
||||||
|
|
||||||
|
__all__ = ['CLICommand', 'InteractiveCLI', 'DisplayFormatter']
|
||||||
|
|
@ -0,0 +1,87 @@
|
||||||
|
from abc import ABC, abstractmethod
|
||||||
|
from typing import Dict, Any
|
||||||
|
from src.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
class CLICommand(ABC):
|
||||||
|
"""Base class for CLI commands"""
|
||||||
|
|
||||||
|
@property
|
||||||
|
@abstractmethod
|
||||||
|
def name(self) -> str:
|
||||||
|
"""Command name"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
@property
|
||||||
|
@abstractmethod
|
||||||
|
def description(self) -> str:
|
||||||
|
"""Command description"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def run(self) -> None:
|
||||||
|
"""Execute the command"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class InteractiveCLI:
|
||||||
|
"""Interactive CLI handler"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.commands: Dict[str, CLICommand] = {}
|
||||||
|
|
||||||
|
def register_command(self, command: CLICommand):
|
||||||
|
"""Register a new command"""
|
||||||
|
self.commands[command.name] = command
|
||||||
|
|
||||||
|
def show_menu(self):
|
||||||
|
"""Display available commands"""
|
||||||
|
print(Config.CLI_BANNER)
|
||||||
|
print(Config.CLI_SEPARATOR)
|
||||||
|
print("Available commands:")
|
||||||
|
for name, cmd in self.commands.items():
|
||||||
|
print(f" 📌 {name}: {cmd.description}")
|
||||||
|
print(" 📌 quit: Exit application")
|
||||||
|
print(" 📌 help: Show this help")
|
||||||
|
print("-" * 50)
|
||||||
|
|
||||||
|
def show_help(self):
|
||||||
|
"""Show detailed help"""
|
||||||
|
print("\n📚 Detailed Help")
|
||||||
|
print("-" * 30)
|
||||||
|
print("Navigation:")
|
||||||
|
print(" - Type a command name to execute it")
|
||||||
|
print(" - Type 'back' in a command to return to menu")
|
||||||
|
print(" - Type 'quit' or Ctrl+C to exit")
|
||||||
|
print("\nAvailable commands:")
|
||||||
|
for name, cmd in self.commands.items():
|
||||||
|
print(f" {name}: {cmd.description}")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Run the interactive CLI"""
|
||||||
|
self.show_menu()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
choice = input("\n💬 Choose a command: ").strip().lower()
|
||||||
|
|
||||||
|
if choice in ['quit', 'exit', 'q']:
|
||||||
|
print("👋 Goodbye!")
|
||||||
|
break
|
||||||
|
|
||||||
|
if choice in ['help', 'h', '?']:
|
||||||
|
self.show_help()
|
||||||
|
continue
|
||||||
|
|
||||||
|
if choice in self.commands:
|
||||||
|
print() # Empty line for readability
|
||||||
|
self.commands[choice].run()
|
||||||
|
print() # Empty line after command
|
||||||
|
else:
|
||||||
|
print("❌ Unknown command. Type 'help' to see available commands.")
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print("\n👋 Stopping program")
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error: {e}")
|
||||||
|
|
@ -0,0 +1,192 @@
|
||||||
|
from typing import Dict, Any
|
||||||
|
|
||||||
|
|
||||||
|
class DisplayFormatter:
|
||||||
|
"""Utility class for formatting display output"""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_sentiment_result(result: Dict[str, Any]) -> str:
|
||||||
|
"""Format sentiment analysis result for display"""
|
||||||
|
if "error" in result:
|
||||||
|
return f"❌ {result['error']}"
|
||||||
|
|
||||||
|
sentiment = result["sentiment"]
|
||||||
|
confidence = result["confidence"]
|
||||||
|
emoji = "😊" if sentiment == "POSITIVE" else "😞"
|
||||||
|
|
||||||
|
return f"{emoji} Sentiment: {sentiment}\n📊 Confidence: {confidence:.2%}"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def show_loading(message: str = "Analysis in progress..."):
|
||||||
|
"""Show loading message"""
|
||||||
|
print(f"\n🔍 {message}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def show_warning(message: str):
|
||||||
|
"""Show warning message"""
|
||||||
|
print(f"⚠️ {message}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def show_error(message: str):
|
||||||
|
"""Show error message"""
|
||||||
|
print(f"❌ {message}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def show_success(message: str):
|
||||||
|
"""Show success message"""
|
||||||
|
print(f"✅ {message}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_fillmask_result(result: Dict[str, Any]) -> str:
|
||||||
|
"""Format fill-mask prediction result for display"""
|
||||||
|
if "error" in result:
|
||||||
|
return f"❌ {result['error']}"
|
||||||
|
|
||||||
|
output = []
|
||||||
|
output.append(f"📝 Original: {result['original_text']}")
|
||||||
|
output.append(f"🎭 Masks found: {result['masks_count']}")
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
if result['masks_count'] == 1:
|
||||||
|
# Single mask
|
||||||
|
output.append("🔮 Predictions:")
|
||||||
|
for i, pred in enumerate(result['predictions'], 1):
|
||||||
|
confidence_bar = "█" * int(pred['score'] * 10)
|
||||||
|
output.append(f" {i}. '{pred['token']}' ({pred['score']:.1%}) {confidence_bar}")
|
||||||
|
output.append(f" → {pred['sequence']}")
|
||||||
|
else:
|
||||||
|
# Multiple masks
|
||||||
|
for mask_info in result['predictions']:
|
||||||
|
output.append(f"🔮 Mask #{mask_info['mask_position']} predictions:")
|
||||||
|
for i, pred in enumerate(mask_info['predictions'], 1):
|
||||||
|
confidence_bar = "█" * int(pred['score'] * 10)
|
||||||
|
output.append(f" {i}. '{pred['token']}' ({pred['score']:.1%}) {confidence_bar}")
|
||||||
|
output.append("")
|
||||||
|
|
||||||
|
return "\n".join(output)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_textgen_result(result: Dict[str, Any]) -> str:
|
||||||
|
"""Format text generation result for display"""
|
||||||
|
if "error" in result:
|
||||||
|
return f"❌ {result['error']}"
|
||||||
|
|
||||||
|
output = []
|
||||||
|
output.append(f"📝 Prompt: {result['prompt']}")
|
||||||
|
output.append(f"⚙️ Parameters: max_length={result['parameters']['max_length']}, "
|
||||||
|
f"temperature={result['parameters']['temperature']}")
|
||||||
|
output.append("-" * 50)
|
||||||
|
|
||||||
|
for i, gen in enumerate(result['generations'], 1):
|
||||||
|
if len(result['generations']) > 1:
|
||||||
|
output.append(f"🎯 Generation {i}:")
|
||||||
|
|
||||||
|
output.append(f"📄 Full text: {gen['text']}")
|
||||||
|
if gen['continuation']:
|
||||||
|
output.append(f"✨ Continuation: {gen['continuation']}")
|
||||||
|
|
||||||
|
if i < len(result['generations']):
|
||||||
|
output.append("-" * 30)
|
||||||
|
|
||||||
|
return "\n".join(output)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_moderation_result(result: Dict[str, Any]) -> str:
|
||||||
|
"""Format content moderation result for display"""
|
||||||
|
if "error" in result:
|
||||||
|
return f"❌ {result['error']}"
|
||||||
|
|
||||||
|
output = []
|
||||||
|
output.append(f"📝 Original: {result['original_text']}")
|
||||||
|
|
||||||
|
if result['is_modified']:
|
||||||
|
output.append(f"🛡️ Moderated: {result['moderated_text']}")
|
||||||
|
output.append(f"⚠️ Status: Content modified ({result['words_replaced']} words replaced)")
|
||||||
|
status_emoji = "🔴"
|
||||||
|
else:
|
||||||
|
output.append("✅ Status: Content approved (no modifications needed)")
|
||||||
|
status_emoji = "🟢"
|
||||||
|
|
||||||
|
# Toxicity score bar
|
||||||
|
score = result['toxic_score']
|
||||||
|
score_bar = "█" * int(score * 10)
|
||||||
|
output.append(f"{status_emoji} Toxicity Score: {score:.1%} {score_bar}")
|
||||||
|
|
||||||
|
return "\n".join(output)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_ner_result(result: Dict[str, Any]) -> str:
|
||||||
|
"""Format NER result for display"""
|
||||||
|
if "error" in result:
|
||||||
|
return f"❌ {result['error']}"
|
||||||
|
|
||||||
|
output = []
|
||||||
|
output.append(f"📝 Original: {result['original_text']}")
|
||||||
|
output.append(f"✨ Highlighted: {result['highlighted_text']}")
|
||||||
|
output.append(f"🎯 Found {result['total_entities']} entities (threshold: {result['confidence_threshold']:.2f})")
|
||||||
|
|
||||||
|
if result['entities']:
|
||||||
|
output.append("\n📋 Detected Entities:")
|
||||||
|
for entity in result['entities']:
|
||||||
|
confidence_bar = "█" * int(entity['confidence'] * 10)
|
||||||
|
output.append(f" {entity['emoji']} {entity['text']} → {entity['label']} "
|
||||||
|
f"({entity['confidence']:.1%}) {confidence_bar}")
|
||||||
|
|
||||||
|
if result['entity_stats']:
|
||||||
|
output.append("\n📊 Entity Statistics:")
|
||||||
|
for entity_type, stats in result['entity_stats'].items():
|
||||||
|
unique_entities = list(set(stats['entities']))
|
||||||
|
emoji = result['entities'][0]['emoji'] if result['entities'] else "🏷️"
|
||||||
|
for ent in result['entities']:
|
||||||
|
if ent['label'] == entity_type:
|
||||||
|
emoji = ent['emoji']
|
||||||
|
break
|
||||||
|
|
||||||
|
output.append(f" {emoji} {entity_type}: {stats['count']} occurrences")
|
||||||
|
if len(unique_entities) <= 3:
|
||||||
|
output.append(f" → {', '.join(unique_entities)}")
|
||||||
|
else:
|
||||||
|
output.append(f" → {', '.join(unique_entities[:3])}... (+{len(unique_entities)-3} more)")
|
||||||
|
|
||||||
|
return "\n".join(output)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def format_ner_analysis(result: Dict[str, Any]) -> str:
|
||||||
|
"""Format comprehensive NER document analysis"""
|
||||||
|
if "error" in result:
|
||||||
|
return f"❌ {result['error']}"
|
||||||
|
|
||||||
|
output = []
|
||||||
|
output.append("📊 Document Analysis Results")
|
||||||
|
output.append("=" * 50)
|
||||||
|
|
||||||
|
# Document statistics
|
||||||
|
stats = result['document_stats']
|
||||||
|
output.append(f"📄 Document: {stats['word_count']} words, {stats['char_count']} characters")
|
||||||
|
output.append(f"📝 Structure: ~{stats['sentence_count']} sentences")
|
||||||
|
output.append(f"🎯 Entity Density: {stats['entity_density']:.2%} (entities per word)")
|
||||||
|
|
||||||
|
# Most common entity type
|
||||||
|
if 'most_common_entity_type' in result:
|
||||||
|
common = result['most_common_entity_type']
|
||||||
|
output.append(f"🏆 Most Common: {common['emoji']} {common['type']} ({common['count']} occurrences)")
|
||||||
|
|
||||||
|
output.append(f"\n✨ Highlighted Text:")
|
||||||
|
output.append(result['highlighted_text'])
|
||||||
|
|
||||||
|
if result['entity_stats']:
|
||||||
|
output.append(f"\n📈 Detailed Statistics:")
|
||||||
|
for entity_type, stats in result['entity_stats'].items():
|
||||||
|
unique_entities = list(set(stats['entities']))
|
||||||
|
emoji = "🏷️"
|
||||||
|
for ent in result['entities']:
|
||||||
|
if ent['label'] == entity_type:
|
||||||
|
emoji = ent['emoji']
|
||||||
|
break
|
||||||
|
|
||||||
|
output.append(f"\n{emoji} {entity_type} ({stats['count']} total):")
|
||||||
|
for entity in unique_entities:
|
||||||
|
count = stats['entities'].count(entity)
|
||||||
|
output.append(f" • {entity} ({count}x)")
|
||||||
|
|
||||||
|
return "\n".join(output)
|
||||||
|
|
@ -0,0 +1,10 @@
|
||||||
|
"""
|
||||||
|
AI Lab commands
|
||||||
|
"""
|
||||||
|
from .sentiment import SentimentCommand
|
||||||
|
from .fillmask import FillMaskCommand
|
||||||
|
from .textgen import TextGenCommand
|
||||||
|
from .moderation import ModerationCommand
|
||||||
|
from .ner import NERCommand
|
||||||
|
|
||||||
|
__all__ = ['SentimentCommand', 'FillMaskCommand', 'TextGenCommand', 'ModerationCommand', 'NERCommand']
|
||||||
|
|
@ -0,0 +1,84 @@
|
||||||
|
from src.cli.base import CLICommand
|
||||||
|
from src.cli.display import DisplayFormatter
|
||||||
|
from src.pipelines.fillmask import FillMaskAnalyzer
|
||||||
|
|
||||||
|
|
||||||
|
class FillMaskCommand(CLICommand):
|
||||||
|
"""Interactive fill-mask prediction command"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.analyzer = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def name(self) -> str:
|
||||||
|
return "fillmask"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def description(self) -> str:
|
||||||
|
return "Interactive fill-mask token prediction"
|
||||||
|
|
||||||
|
def _initialize_analyzer(self):
|
||||||
|
"""Lazy initialization of the analyzer"""
|
||||||
|
if self.analyzer is None:
|
||||||
|
print("🔄 Loading fill-mask model...")
|
||||||
|
self.analyzer = FillMaskAnalyzer()
|
||||||
|
DisplayFormatter.show_success("Model loaded!")
|
||||||
|
|
||||||
|
def _show_instructions(self):
|
||||||
|
"""Show usage instructions"""
|
||||||
|
print("\n📝 Fill-Mask Prediction")
|
||||||
|
print("Replace words with [MASK] token and get predictions")
|
||||||
|
print("\nExamples:")
|
||||||
|
print(" - The weather today is [MASK]")
|
||||||
|
print(" - I love to [MASK] music")
|
||||||
|
print(" - Paris is the capital of [MASK]")
|
||||||
|
print("\nType 'back' to return to main menu")
|
||||||
|
print("Type 'help' to see these instructions again")
|
||||||
|
print("-" * 50)
|
||||||
|
|
||||||
|
def _get_top_k(self) -> int:
|
||||||
|
"""Get number of predictions from user"""
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
top_k_input = input("📊 Number of predictions (1-10, default=5): ").strip()
|
||||||
|
if not top_k_input:
|
||||||
|
return 5
|
||||||
|
|
||||||
|
top_k = int(top_k_input)
|
||||||
|
if 1 <= top_k <= 10:
|
||||||
|
return top_k
|
||||||
|
else:
|
||||||
|
DisplayFormatter.show_warning("Please enter a number between 1 and 10")
|
||||||
|
except ValueError:
|
||||||
|
DisplayFormatter.show_warning("Please enter a valid number")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Run interactive fill-mask prediction"""
|
||||||
|
self._initialize_analyzer()
|
||||||
|
self._show_instructions()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
text = input("\n💬 Enter text with [MASK]: ").strip()
|
||||||
|
|
||||||
|
if text.lower() in ['back', 'return']:
|
||||||
|
break
|
||||||
|
|
||||||
|
if text.lower() == 'help':
|
||||||
|
self._show_instructions()
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not text:
|
||||||
|
DisplayFormatter.show_warning("Please enter some text")
|
||||||
|
continue
|
||||||
|
|
||||||
|
if "[MASK]" not in text:
|
||||||
|
DisplayFormatter.show_warning("Text must contain [MASK] token")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Get number of predictions
|
||||||
|
top_k = self._get_top_k()
|
||||||
|
|
||||||
|
DisplayFormatter.show_loading("Predicting tokens...")
|
||||||
|
result = self.analyzer.predict(text, top_k=top_k)
|
||||||
|
formatted_result = DisplayFormatter.format_fillmask_result(result)
|
||||||
|
print(formatted_result)
|
||||||
|
|
@ -0,0 +1,73 @@
|
||||||
|
from src.cli.base import CLICommand
|
||||||
|
from src.cli.display import DisplayFormatter
|
||||||
|
from src.pipelines.moderation import ContentModerator
|
||||||
|
|
||||||
|
|
||||||
|
class ModerationCommand(CLICommand):
|
||||||
|
"""Interactive content moderation command"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.moderator = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def name(self) -> str:
|
||||||
|
return "moderation"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def description(self) -> str:
|
||||||
|
return "Content moderation and filtering"
|
||||||
|
|
||||||
|
def _initialize_moderator(self):
|
||||||
|
"""Lazy initialization of the moderator"""
|
||||||
|
if self.moderator is None:
|
||||||
|
print("🔄 Loading content moderation model...")
|
||||||
|
self.moderator = ContentModerator()
|
||||||
|
DisplayFormatter.show_success("Moderation model loaded!")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Run interactive content moderation"""
|
||||||
|
self._initialize_moderator()
|
||||||
|
|
||||||
|
print("\n🛡️ Content Moderation")
|
||||||
|
print("Type 'back' to return to main menu")
|
||||||
|
print("Type 'settings' to adjust moderation sensitivity")
|
||||||
|
print("-" * 40)
|
||||||
|
|
||||||
|
while True:
|
||||||
|
text = input("\n📝 Enter text to moderate: ").strip()
|
||||||
|
|
||||||
|
if text.lower() in ['back', 'return']:
|
||||||
|
break
|
||||||
|
|
||||||
|
if text.lower() == 'settings':
|
||||||
|
self._show_settings()
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not text:
|
||||||
|
DisplayFormatter.show_warning("Please enter some text")
|
||||||
|
continue
|
||||||
|
|
||||||
|
DisplayFormatter.show_loading("Analyzing content...")
|
||||||
|
result = self.moderator.moderate(text)
|
||||||
|
formatted_result = DisplayFormatter.format_moderation_result(result)
|
||||||
|
print(formatted_result)
|
||||||
|
|
||||||
|
def _show_settings(self):
|
||||||
|
"""Show and allow modification of moderation settings"""
|
||||||
|
print(f"\n⚙️ Current Settings:")
|
||||||
|
print(f"Toxicity threshold: {self.moderator.toxicity_threshold:.2f}")
|
||||||
|
print("\nOptions:")
|
||||||
|
print("1. Change threshold (0.0 = very strict, 1.0 = very permissive)")
|
||||||
|
print("2. Back to moderation")
|
||||||
|
|
||||||
|
choice = input("\nChoose option (1-2): ").strip()
|
||||||
|
|
||||||
|
if choice == "1":
|
||||||
|
try:
|
||||||
|
new_threshold = float(input("Enter new threshold (0.0-1.0): "))
|
||||||
|
self.moderator.set_threshold(new_threshold)
|
||||||
|
DisplayFormatter.show_success(f"Threshold set to {new_threshold:.2f}")
|
||||||
|
except ValueError:
|
||||||
|
DisplayFormatter.show_error("Invalid threshold value")
|
||||||
|
elif choice != "2":
|
||||||
|
DisplayFormatter.show_warning("Invalid option")
|
||||||
|
|
@ -0,0 +1,137 @@
|
||||||
|
from src.cli.base import CLICommand
|
||||||
|
from src.cli.display import DisplayFormatter
|
||||||
|
from src.pipelines.ner import NamedEntityRecognizer
|
||||||
|
|
||||||
|
|
||||||
|
class NERCommand(CLICommand):
|
||||||
|
"""Interactive Named Entity Recognition command"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.recognizer = None
|
||||||
|
self.confidence_threshold = 0.9
|
||||||
|
|
||||||
|
@property
|
||||||
|
def name(self) -> str:
|
||||||
|
return "ner"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def description(self) -> str:
|
||||||
|
return "Named Entity Recognition - Extract people, places, organizations"
|
||||||
|
|
||||||
|
def _initialize_recognizer(self):
|
||||||
|
"""Lazy initialization of the recognizer"""
|
||||||
|
if self.recognizer is None:
|
||||||
|
print("🔄 Loading NER model...")
|
||||||
|
self.recognizer = NamedEntityRecognizer()
|
||||||
|
DisplayFormatter.show_success("NER model loaded!")
|
||||||
|
|
||||||
|
def _show_instructions(self):
|
||||||
|
"""Show usage instructions and examples"""
|
||||||
|
print("\n🎯 Named Entity Recognition")
|
||||||
|
print("Extract and classify entities like people, organizations, locations, etc.")
|
||||||
|
print("\n📝 Examples to try:")
|
||||||
|
print(" - Apple Inc. was founded by Steve Jobs in Cupertino, California.")
|
||||||
|
print(" - Barack Obama visited Paris in 2015 to meet Emmanuel Macron.")
|
||||||
|
print(" - Microsoft acquired GitHub for $7.5 billion in June 2018.")
|
||||||
|
print("\n🎛️ Commands:")
|
||||||
|
print(" 'back' - Return to main menu")
|
||||||
|
print(" 'help' - Show these instructions")
|
||||||
|
print(" 'settings' - Adjust confidence threshold")
|
||||||
|
print(" 'types' - Show entity types")
|
||||||
|
print(" 'analyze' - Detailed document analysis mode")
|
||||||
|
print("-" * 60)
|
||||||
|
|
||||||
|
def _show_entity_types(self):
|
||||||
|
"""Show available entity types"""
|
||||||
|
entity_types = self.recognizer.get_entity_types()
|
||||||
|
print("\n🏷️ Entity Types:")
|
||||||
|
type_descriptions = {
|
||||||
|
"PER": "Person names",
|
||||||
|
"ORG": "Organizations, companies",
|
||||||
|
"LOC": "Locations, places",
|
||||||
|
"MISC": "Miscellaneous entities",
|
||||||
|
"DATE": "Dates and time periods",
|
||||||
|
"TIME": "Specific times",
|
||||||
|
"MONEY": "Monetary amounts",
|
||||||
|
"PERCENT": "Percentages"
|
||||||
|
}
|
||||||
|
|
||||||
|
for entity_type, emoji in entity_types.items():
|
||||||
|
description = type_descriptions.get(entity_type, "Other entities")
|
||||||
|
print(f" {emoji} {entity_type}: {description}")
|
||||||
|
|
||||||
|
def _adjust_settings(self):
|
||||||
|
"""Allow user to adjust confidence threshold"""
|
||||||
|
print(f"\n⚙️ Current confidence threshold: {self.confidence_threshold:.2f}")
|
||||||
|
print("Lower values = more entities detected (but less accurate)")
|
||||||
|
print("Higher values = fewer entities detected (but more accurate)")
|
||||||
|
|
||||||
|
try:
|
||||||
|
new_threshold = input(f"Enter new threshold (0.1-1.0, current: {self.confidence_threshold}): ").strip()
|
||||||
|
if new_threshold:
|
||||||
|
threshold = float(new_threshold)
|
||||||
|
if 0.1 <= threshold <= 1.0:
|
||||||
|
self.confidence_threshold = threshold
|
||||||
|
DisplayFormatter.show_success(f"Threshold set to {threshold:.2f}")
|
||||||
|
else:
|
||||||
|
DisplayFormatter.show_warning("Threshold must be between 0.1 and 1.0")
|
||||||
|
except ValueError:
|
||||||
|
DisplayFormatter.show_error("Invalid threshold value")
|
||||||
|
|
||||||
|
def _analyze_mode(self):
|
||||||
|
"""Document analysis mode with detailed statistics"""
|
||||||
|
print("\n📊 Document Analysis Mode")
|
||||||
|
print("Enter longer text for comprehensive entity analysis")
|
||||||
|
print("Type 'done' when finished")
|
||||||
|
print("-" * 40)
|
||||||
|
|
||||||
|
lines = []
|
||||||
|
while True:
|
||||||
|
line = input("📝 ").strip()
|
||||||
|
if line.lower() == 'done':
|
||||||
|
break
|
||||||
|
if line:
|
||||||
|
lines.append(line)
|
||||||
|
|
||||||
|
if not lines:
|
||||||
|
DisplayFormatter.show_warning("No text entered")
|
||||||
|
return
|
||||||
|
|
||||||
|
document = " ".join(lines)
|
||||||
|
DisplayFormatter.show_loading("Analyzing document...")
|
||||||
|
|
||||||
|
result = self.recognizer.analyze_document(document, self.confidence_threshold)
|
||||||
|
formatted_result = DisplayFormatter.format_ner_analysis(result)
|
||||||
|
print(formatted_result)
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Run interactive NER"""
|
||||||
|
self._initialize_recognizer()
|
||||||
|
self._show_instructions()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
text = input("\n💬 Enter text to analyze: ").strip()
|
||||||
|
|
||||||
|
if text.lower() == 'back':
|
||||||
|
break
|
||||||
|
elif text.lower() == 'help':
|
||||||
|
self._show_instructions()
|
||||||
|
continue
|
||||||
|
elif text.lower() == 'settings':
|
||||||
|
self._adjust_settings()
|
||||||
|
continue
|
||||||
|
elif text.lower() == 'types':
|
||||||
|
self._show_entity_types()
|
||||||
|
continue
|
||||||
|
elif text.lower() == 'analyze':
|
||||||
|
self._analyze_mode()
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not text:
|
||||||
|
DisplayFormatter.show_warning("Please enter some text")
|
||||||
|
continue
|
||||||
|
|
||||||
|
DisplayFormatter.show_loading("Extracting entities...")
|
||||||
|
result = self.recognizer.recognize(text, self.confidence_threshold)
|
||||||
|
formatted_result = DisplayFormatter.format_ner_result(result)
|
||||||
|
print(formatted_result)
|
||||||
|
|
@ -0,0 +1,48 @@
|
||||||
|
from src.cli.base import CLICommand
|
||||||
|
from src.cli.display import DisplayFormatter
|
||||||
|
from src.pipelines.sentiment import SentimentAnalyzer
|
||||||
|
|
||||||
|
|
||||||
|
class SentimentCommand(CLICommand):
|
||||||
|
"""Interactive sentiment analysis command"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.analyzer = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def name(self) -> str:
|
||||||
|
return "sentiment"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def description(self) -> str:
|
||||||
|
return "Interactive sentiment analysis"
|
||||||
|
|
||||||
|
def _initialize_analyzer(self):
|
||||||
|
"""Lazy initialization of the analyzer"""
|
||||||
|
if self.analyzer is None:
|
||||||
|
print("🔄 Loading sentiment model...")
|
||||||
|
self.analyzer = SentimentAnalyzer()
|
||||||
|
DisplayFormatter.show_success("Model loaded!")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Run interactive sentiment analysis"""
|
||||||
|
self._initialize_analyzer()
|
||||||
|
|
||||||
|
print("\n📝 Sentiment Analysis")
|
||||||
|
print("Type 'back' to return to main menu")
|
||||||
|
print("-" * 30)
|
||||||
|
|
||||||
|
while True:
|
||||||
|
text = input("\n💬 Enter your text: ").strip()
|
||||||
|
|
||||||
|
if text.lower() in ['back', 'return']:
|
||||||
|
break
|
||||||
|
|
||||||
|
if not text:
|
||||||
|
DisplayFormatter.show_warning("Please enter some text")
|
||||||
|
continue
|
||||||
|
|
||||||
|
DisplayFormatter.show_loading()
|
||||||
|
result = self.analyzer.analyze(text)
|
||||||
|
formatted_result = DisplayFormatter.format_sentiment_result(result)
|
||||||
|
print(formatted_result)
|
||||||
|
|
@ -0,0 +1,95 @@
|
||||||
|
from src.cli.base import CLICommand
|
||||||
|
from src.cli.display import DisplayFormatter
|
||||||
|
from src.pipelines.textgen import TextGenerator
|
||||||
|
|
||||||
|
|
||||||
|
class TextGenCommand(CLICommand):
|
||||||
|
"""Interactive text generation command"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.generator = None
|
||||||
|
self.default_params = {
|
||||||
|
'max_length': 100,
|
||||||
|
'num_return_sequences': 1,
|
||||||
|
'temperature': 1.0,
|
||||||
|
'do_sample': True
|
||||||
|
}
|
||||||
|
|
||||||
|
@property
|
||||||
|
def name(self) -> str:
|
||||||
|
return "textgen"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def description(self) -> str:
|
||||||
|
return "Interactive text generation"
|
||||||
|
|
||||||
|
def _initialize_generator(self):
|
||||||
|
"""Lazy initialization of the generator"""
|
||||||
|
if self.generator is None:
|
||||||
|
print("🔄 Loading text generation model...")
|
||||||
|
self.generator = TextGenerator()
|
||||||
|
DisplayFormatter.show_success("Model loaded!")
|
||||||
|
|
||||||
|
def _show_parameters(self):
|
||||||
|
"""Show current generation parameters"""
|
||||||
|
print("\n⚙️ Current parameters:")
|
||||||
|
for key, value in self.default_params.items():
|
||||||
|
print(f" {key}: {value}")
|
||||||
|
|
||||||
|
def _update_parameters(self):
|
||||||
|
"""Allow user to update generation parameters"""
|
||||||
|
print("\n🔧 Update parameters (press Enter to keep current value):")
|
||||||
|
|
||||||
|
try:
|
||||||
|
max_length = input(f"Max length ({self.default_params['max_length']}): ").strip()
|
||||||
|
if max_length:
|
||||||
|
self.default_params['max_length'] = int(max_length)
|
||||||
|
|
||||||
|
num_sequences = input(f"Number of sequences ({self.default_params['num_return_sequences']}): ").strip()
|
||||||
|
if num_sequences:
|
||||||
|
self.default_params['num_return_sequences'] = int(num_sequences)
|
||||||
|
|
||||||
|
temperature = input(f"Temperature ({self.default_params['temperature']}): ").strip()
|
||||||
|
if temperature:
|
||||||
|
self.default_params['temperature'] = float(temperature)
|
||||||
|
|
||||||
|
do_sample = input(f"Use sampling ({self.default_params['do_sample']}): ").strip().lower()
|
||||||
|
if do_sample in ['true', 'false']:
|
||||||
|
self.default_params['do_sample'] = do_sample == 'true'
|
||||||
|
|
||||||
|
DisplayFormatter.show_success("Parameters updated!")
|
||||||
|
|
||||||
|
except ValueError as e:
|
||||||
|
DisplayFormatter.show_error(f"Invalid parameter value: {e}")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Run interactive text generation"""
|
||||||
|
self._initialize_generator()
|
||||||
|
|
||||||
|
print("\n📝 Text Generation")
|
||||||
|
print("Commands:")
|
||||||
|
print(" 'back' - Return to main menu")
|
||||||
|
print(" 'params' - Show current parameters")
|
||||||
|
print(" 'config' - Update parameters")
|
||||||
|
print("-" * 40)
|
||||||
|
|
||||||
|
while True:
|
||||||
|
prompt = input("\n💬 Enter your prompt: ").strip()
|
||||||
|
|
||||||
|
if prompt.lower() == 'back':
|
||||||
|
break
|
||||||
|
elif prompt.lower() == 'params':
|
||||||
|
self._show_parameters()
|
||||||
|
continue
|
||||||
|
elif prompt.lower() == 'config':
|
||||||
|
self._update_parameters()
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not prompt:
|
||||||
|
DisplayFormatter.show_warning("Please enter a prompt")
|
||||||
|
continue
|
||||||
|
|
||||||
|
DisplayFormatter.show_loading("Generating text...")
|
||||||
|
result = self.generator.generate(prompt, **self.default_params)
|
||||||
|
formatted_result = DisplayFormatter.format_textgen_result(result)
|
||||||
|
print(formatted_result)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
"""
|
||||||
|
Project configuration
|
||||||
|
"""
|
||||||
|
from .settings import Config
|
||||||
|
|
||||||
|
__all__ = ['Config']
|
||||||
|
|
@ -0,0 +1,40 @@
|
||||||
|
"""
|
||||||
|
Global project configuration
|
||||||
|
"""
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, Any
|
||||||
|
|
||||||
|
|
||||||
|
class Config:
|
||||||
|
"""Global application configuration"""
|
||||||
|
|
||||||
|
# Paths
|
||||||
|
PROJECT_ROOT = Path(__file__).parent.parent.parent
|
||||||
|
SRC_DIR = PROJECT_ROOT / "src"
|
||||||
|
|
||||||
|
# Default models
|
||||||
|
DEFAULT_MODELS = {
|
||||||
|
"sentiment": "cardiffnlp/twitter-roberta-base-sentiment-latest",
|
||||||
|
"fillmask": "distilbert-base-uncased",
|
||||||
|
"textgen": "gpt2",
|
||||||
|
"moderation": "unitary/toxic-bert",
|
||||||
|
"ner": "dbmdz/bert-large-cased-finetuned-conll03-english",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Interface
|
||||||
|
CLI_BANNER = "🤖 AI Lab - Transformers Experimentation"
|
||||||
|
CLI_SEPARATOR = "=" * 50
|
||||||
|
|
||||||
|
# Performance
|
||||||
|
MAX_BATCH_SIZE = 32
|
||||||
|
DEFAULT_MAX_LENGTH = 512
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_model(cls, pipeline_name: str) -> str:
|
||||||
|
"""Get default model for a pipeline"""
|
||||||
|
return cls.DEFAULT_MODELS.get(pipeline_name, "")
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_all_models(cls) -> Dict[str, str]:
|
||||||
|
"""Get all configured models"""
|
||||||
|
return cls.DEFAULT_MODELS.copy()
|
||||||
|
|
@ -0,0 +1,38 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
CLI entry point for AI Lab
|
||||||
|
"""
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add parent directory to PYTHONPATH
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from src.cli import InteractiveCLI
|
||||||
|
from src.commands import SentimentCommand, FillMaskCommand, TextGenCommand, ModerationCommand, NERCommand
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main CLI function"""
|
||||||
|
try:
|
||||||
|
# Create CLI interface
|
||||||
|
cli = InteractiveCLI()
|
||||||
|
|
||||||
|
# Register available commands
|
||||||
|
cli.register_command(SentimentCommand())
|
||||||
|
cli.register_command(FillMaskCommand())
|
||||||
|
cli.register_command(TextGenCommand())
|
||||||
|
cli.register_command(ModerationCommand())
|
||||||
|
cli.register_command(NERCommand())
|
||||||
|
|
||||||
|
# Launch interactive interface
|
||||||
|
cli.run()
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print("\n👋 Stopping program")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,11 @@
|
||||||
|
"""
|
||||||
|
Experimentation pipelines with transformers
|
||||||
|
"""
|
||||||
|
from .sentiment import SentimentAnalyzer
|
||||||
|
from .fillmask import FillMaskAnalyzer
|
||||||
|
from .textgen import TextGenerator
|
||||||
|
from .moderation import ContentModerator
|
||||||
|
from .ner import NamedEntityRecognizer
|
||||||
|
from .template import TemplatePipeline
|
||||||
|
|
||||||
|
__all__ = ['SentimentAnalyzer', 'FillMaskAnalyzer', 'TextGenerator', 'ContentModerator', 'NamedEntityRecognizer', 'TemplatePipeline']
|
||||||
|
|
@ -0,0 +1,95 @@
|
||||||
|
from transformers import pipeline
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
from src.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
class FillMaskAnalyzer:
|
||||||
|
"""Fill-mask analyzer using transformers"""
|
||||||
|
|
||||||
|
def __init__(self, model_name: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize the fill-mask pipeline
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name: Name of the model to use (optional)
|
||||||
|
"""
|
||||||
|
self.model_name = model_name or Config.get_model("fillmask")
|
||||||
|
print(f"Loading fill-mask model: {self.model_name}")
|
||||||
|
self.pipeline = pipeline("fill-mask", model=self.model_name)
|
||||||
|
print("Model loaded successfully!")
|
||||||
|
|
||||||
|
def predict(self, text: str, top_k: int = 5) -> Dict:
|
||||||
|
"""
|
||||||
|
Predict masked tokens in text
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Text with [MASK] token(s) to predict
|
||||||
|
top_k: Number of top predictions to return
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with predictions and scores
|
||||||
|
"""
|
||||||
|
if not text.strip():
|
||||||
|
return {"error": "Empty text"}
|
||||||
|
|
||||||
|
if "[MASK]" not in text:
|
||||||
|
return {"error": "Text must contain [MASK] token"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = self.pipeline(text, top_k=top_k)
|
||||||
|
|
||||||
|
# Handle single mask vs multiple masks
|
||||||
|
if isinstance(results, list) and isinstance(results[0], list):
|
||||||
|
# Multiple masks
|
||||||
|
predictions = []
|
||||||
|
for i, mask_results in enumerate(results):
|
||||||
|
mask_predictions = [
|
||||||
|
{
|
||||||
|
"token": pred["token_str"],
|
||||||
|
"score": round(pred["score"], 4),
|
||||||
|
"sequence": pred["sequence"]
|
||||||
|
}
|
||||||
|
for pred in mask_results
|
||||||
|
]
|
||||||
|
predictions.append({
|
||||||
|
"mask_position": i + 1,
|
||||||
|
"predictions": mask_predictions
|
||||||
|
})
|
||||||
|
|
||||||
|
return {
|
||||||
|
"original_text": text,
|
||||||
|
"masks_count": len(results),
|
||||||
|
"predictions": predictions
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
# Single mask
|
||||||
|
predictions = [
|
||||||
|
{
|
||||||
|
"token": pred["token_str"],
|
||||||
|
"score": round(pred["score"], 4),
|
||||||
|
"sequence": pred["sequence"]
|
||||||
|
}
|
||||||
|
for pred in results
|
||||||
|
]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"original_text": text,
|
||||||
|
"masks_count": 1,
|
||||||
|
"predictions": predictions
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"Prediction error: {str(e)}"}
|
||||||
|
|
||||||
|
def predict_batch(self, texts: List[str], top_k: int = 5) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Predict masked tokens for multiple texts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
texts: List of texts with [MASK] tokens
|
||||||
|
top_k: Number of top predictions to return
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of prediction results
|
||||||
|
"""
|
||||||
|
return [self.predict(text, top_k) for text in texts]
|
||||||
|
|
@ -0,0 +1,174 @@
|
||||||
|
from transformers import pipeline
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
import re
|
||||||
|
from src.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
class ContentModerator:
|
||||||
|
"""Content moderator that detects and replaces inappropriate content"""
|
||||||
|
|
||||||
|
def __init__(self, model_name: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize the content moderation pipeline
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name: Name of the model to use (optional)
|
||||||
|
"""
|
||||||
|
self.model_name = model_name or Config.get_model("moderation")
|
||||||
|
print(f"Loading moderation model: {self.model_name}")
|
||||||
|
self.classifier = pipeline("text-classification", model=self.model_name)
|
||||||
|
print("Moderation model loaded successfully!")
|
||||||
|
|
||||||
|
# Threshold for considering content as toxic
|
||||||
|
self.toxicity_threshold = 0.5
|
||||||
|
|
||||||
|
def moderate(self, text: str, replacement: str = "***") -> Dict:
|
||||||
|
"""
|
||||||
|
Moderate content by detecting and replacing inappropriate words
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Text to moderate
|
||||||
|
replacement: String to replace inappropriate content with
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with original text, moderated text, and detection info
|
||||||
|
"""
|
||||||
|
if not text.strip():
|
||||||
|
return {"error": "Empty text"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# First, check overall toxicity
|
||||||
|
result = self.classifier(text)
|
||||||
|
|
||||||
|
# Handle different model output formats
|
||||||
|
if isinstance(result, list):
|
||||||
|
predictions = result
|
||||||
|
else:
|
||||||
|
predictions = [result]
|
||||||
|
|
||||||
|
# Find toxicity score
|
||||||
|
toxic_score = 0.0
|
||||||
|
is_toxic = False
|
||||||
|
|
||||||
|
for pred in predictions:
|
||||||
|
label = pred["label"].upper()
|
||||||
|
score = pred["score"]
|
||||||
|
|
||||||
|
# Check different possible toxic labels
|
||||||
|
if label in ["TOXIC", "TOXICITY", "HARMFUL", "1"]:
|
||||||
|
toxic_score = max(toxic_score, score)
|
||||||
|
if score > self.toxicity_threshold:
|
||||||
|
is_toxic = True
|
||||||
|
elif label in ["NOT_TOXIC", "CLEAN", "0"]:
|
||||||
|
# For models where high score means NOT toxic
|
||||||
|
toxic_score = max(toxic_score, 1.0 - score)
|
||||||
|
if (1.0 - score) > self.toxicity_threshold:
|
||||||
|
is_toxic = True
|
||||||
|
|
||||||
|
if not is_toxic:
|
||||||
|
return {
|
||||||
|
"original_text": text,
|
||||||
|
"moderated_text": text,
|
||||||
|
"is_modified": False,
|
||||||
|
"toxic_score": toxic_score,
|
||||||
|
"words_replaced": 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# If toxic, analyze word by word to find problematic parts
|
||||||
|
moderated_text, words_replaced = self._moderate_by_words(text, replacement)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"original_text": text,
|
||||||
|
"moderated_text": moderated_text,
|
||||||
|
"is_modified": True,
|
||||||
|
"toxic_score": toxic_score,
|
||||||
|
"words_replaced": words_replaced
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"Moderation error: {str(e)}"}
|
||||||
|
|
||||||
|
def _moderate_by_words(self, text: str, replacement: str) -> tuple[str, int]:
|
||||||
|
"""
|
||||||
|
Moderate text by analyzing individual words and phrases
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Original text
|
||||||
|
replacement: Replacement string
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (moderated_text, words_replaced_count)
|
||||||
|
"""
|
||||||
|
words = text.split()
|
||||||
|
moderated_words = []
|
||||||
|
words_replaced = 0
|
||||||
|
|
||||||
|
# Check individual words
|
||||||
|
for word in words:
|
||||||
|
# Clean word for analysis (remove punctuation)
|
||||||
|
clean_word = re.sub(r'[^\w]', '', word)
|
||||||
|
if not clean_word:
|
||||||
|
moderated_words.append(word)
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
word_result = self.classifier(clean_word)
|
||||||
|
|
||||||
|
# Handle different model output formats
|
||||||
|
if isinstance(word_result, list):
|
||||||
|
predictions = word_result
|
||||||
|
else:
|
||||||
|
predictions = [word_result]
|
||||||
|
|
||||||
|
is_word_toxic = False
|
||||||
|
for pred in predictions:
|
||||||
|
label = pred["label"].upper()
|
||||||
|
score = pred["score"]
|
||||||
|
|
||||||
|
if label in ["TOXIC", "TOXICITY", "HARMFUL", "1"]:
|
||||||
|
if score > self.toxicity_threshold:
|
||||||
|
is_word_toxic = True
|
||||||
|
break
|
||||||
|
elif label in ["NOT_TOXIC", "CLEAN", "0"]:
|
||||||
|
if (1.0 - score) > self.toxicity_threshold:
|
||||||
|
is_word_toxic = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if is_word_toxic:
|
||||||
|
# Replace the clean part with asterisks, keep punctuation
|
||||||
|
moderated_word = re.sub(r'\w+', replacement, word)
|
||||||
|
moderated_words.append(moderated_word)
|
||||||
|
words_replaced += 1
|
||||||
|
else:
|
||||||
|
moderated_words.append(word)
|
||||||
|
|
||||||
|
except:
|
||||||
|
# If analysis fails for a word, keep it as is
|
||||||
|
moderated_words.append(word)
|
||||||
|
|
||||||
|
return " ".join(moderated_words), words_replaced
|
||||||
|
|
||||||
|
def moderate_batch(self, texts: List[str], replacement: str = "***") -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Moderate multiple texts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
texts: List of texts to moderate
|
||||||
|
replacement: String to replace inappropriate content with
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of moderation results
|
||||||
|
"""
|
||||||
|
return [self.moderate(text, replacement) for text in texts]
|
||||||
|
|
||||||
|
def set_threshold(self, threshold: float):
|
||||||
|
"""
|
||||||
|
Set the toxicity threshold
|
||||||
|
|
||||||
|
Args:
|
||||||
|
threshold: Threshold between 0 and 1
|
||||||
|
"""
|
||||||
|
if 0 <= threshold <= 1:
|
||||||
|
self.toxicity_threshold = threshold
|
||||||
|
else:
|
||||||
|
raise ValueError("Threshold must be between 0 and 1")
|
||||||
|
|
@ -0,0 +1,179 @@
|
||||||
|
from transformers import pipeline
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
from src.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
class NamedEntityRecognizer:
|
||||||
|
"""Named Entity Recognition using transformers"""
|
||||||
|
|
||||||
|
def __init__(self, model_name: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize the NER pipeline
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name: Name of the model to use (optional)
|
||||||
|
"""
|
||||||
|
self.model_name = model_name or Config.get_model("ner")
|
||||||
|
print(f"Loading NER model: {self.model_name}")
|
||||||
|
self.pipeline = pipeline("ner", model=self.model_name, aggregation_strategy="simple")
|
||||||
|
print("NER model loaded successfully!")
|
||||||
|
|
||||||
|
# Entity type mappings for better display
|
||||||
|
self.entity_colors = {
|
||||||
|
"PER": "👤", # Person
|
||||||
|
"ORG": "🏢", # Organization
|
||||||
|
"LOC": "📍", # Location
|
||||||
|
"MISC": "🏷️", # Miscellaneous
|
||||||
|
"DATE": "📅", # Date
|
||||||
|
"TIME": "⏰", # Time
|
||||||
|
"MONEY": "💰", # Money
|
||||||
|
"PERCENT": "📊", # Percentage
|
||||||
|
}
|
||||||
|
|
||||||
|
def recognize(self, text: str, confidence_threshold: float = 0.9) -> Dict:
|
||||||
|
"""
|
||||||
|
Recognize named entities in text
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Text to analyze
|
||||||
|
confidence_threshold: Minimum confidence score for entities
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with entities and their information
|
||||||
|
"""
|
||||||
|
if not text.strip():
|
||||||
|
return {"error": "Empty text"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
entities = self.pipeline(text)
|
||||||
|
|
||||||
|
# Filter by confidence and process entities
|
||||||
|
filtered_entities = []
|
||||||
|
entity_stats = {}
|
||||||
|
|
||||||
|
for entity in entities:
|
||||||
|
if entity["score"] >= confidence_threshold:
|
||||||
|
entity_type = entity["entity_group"]
|
||||||
|
|
||||||
|
processed_entity = {
|
||||||
|
"text": entity["word"],
|
||||||
|
"label": entity_type,
|
||||||
|
"confidence": round(entity["score"], 4),
|
||||||
|
"start": entity["start"],
|
||||||
|
"end": entity["end"],
|
||||||
|
"emoji": self.entity_colors.get(entity_type, "🏷️")
|
||||||
|
}
|
||||||
|
|
||||||
|
filtered_entities.append(processed_entity)
|
||||||
|
|
||||||
|
# Update statistics
|
||||||
|
if entity_type not in entity_stats:
|
||||||
|
entity_stats[entity_type] = {"count": 0, "entities": []}
|
||||||
|
entity_stats[entity_type]["count"] += 1
|
||||||
|
entity_stats[entity_type]["entities"].append(entity["word"])
|
||||||
|
|
||||||
|
# Create highlighted text
|
||||||
|
highlighted_text = self._highlight_entities(text, filtered_entities)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"original_text": text,
|
||||||
|
"highlighted_text": highlighted_text,
|
||||||
|
"entities": filtered_entities,
|
||||||
|
"entity_stats": entity_stats,
|
||||||
|
"total_entities": len(filtered_entities),
|
||||||
|
"confidence_threshold": confidence_threshold
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"NER processing error: {str(e)}"}
|
||||||
|
|
||||||
|
def _highlight_entities(self, text: str, entities: List[Dict]) -> str:
|
||||||
|
"""
|
||||||
|
Create highlighted version of text with entity markers
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Original text
|
||||||
|
entities: List of detected entities
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Text with highlighted entities
|
||||||
|
"""
|
||||||
|
if not entities:
|
||||||
|
return text
|
||||||
|
|
||||||
|
# Sort entities by start position (reverse order for replacement)
|
||||||
|
sorted_entities = sorted(entities, key=lambda x: x["start"], reverse=True)
|
||||||
|
|
||||||
|
highlighted = text
|
||||||
|
for entity in sorted_entities:
|
||||||
|
start, end = entity["start"], entity["end"]
|
||||||
|
entity_text = entity["text"]
|
||||||
|
emoji = entity["emoji"]
|
||||||
|
label = entity["label"]
|
||||||
|
confidence = entity["confidence"]
|
||||||
|
|
||||||
|
# Create highlighted version
|
||||||
|
highlight = f"{emoji}[{entity_text}]({label}:{confidence:.2f})"
|
||||||
|
highlighted = highlighted[:start] + highlight + highlighted[end:]
|
||||||
|
|
||||||
|
return highlighted
|
||||||
|
|
||||||
|
def analyze_document(self, text: str, confidence_threshold: float = 0.9) -> Dict:
|
||||||
|
"""
|
||||||
|
Perform comprehensive document analysis with entity extraction
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Document text to analyze
|
||||||
|
confidence_threshold: Minimum confidence for entities
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Comprehensive analysis results
|
||||||
|
"""
|
||||||
|
result = self.recognize(text, confidence_threshold)
|
||||||
|
|
||||||
|
if "error" in result:
|
||||||
|
return result
|
||||||
|
|
||||||
|
# Additional analysis
|
||||||
|
analysis = {
|
||||||
|
**result,
|
||||||
|
"document_stats": {
|
||||||
|
"word_count": len(text.split()),
|
||||||
|
"char_count": len(text),
|
||||||
|
"sentence_count": len([s for s in text.split('.') if s.strip()]),
|
||||||
|
"entity_density": len(result["entities"]) / len(text.split()) if text.split() else 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Find most common entity types
|
||||||
|
if result["entity_stats"]:
|
||||||
|
most_common_type = max(result["entity_stats"].items(), key=lambda x: x[1]["count"])
|
||||||
|
analysis["most_common_entity_type"] = {
|
||||||
|
"type": most_common_type[0],
|
||||||
|
"count": most_common_type[1]["count"],
|
||||||
|
"emoji": self.entity_colors.get(most_common_type[0], "🏷️")
|
||||||
|
}
|
||||||
|
|
||||||
|
return analysis
|
||||||
|
|
||||||
|
def recognize_batch(self, texts: List[str], confidence_threshold: float = 0.9) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Recognize entities in multiple texts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
texts: List of texts to analyze
|
||||||
|
confidence_threshold: Minimum confidence for entities
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of NER results
|
||||||
|
"""
|
||||||
|
return [self.recognize(text, confidence_threshold) for text in texts]
|
||||||
|
|
||||||
|
def get_entity_types(self) -> Dict[str, str]:
|
||||||
|
"""
|
||||||
|
Get available entity types with their emojis
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary mapping entity types to emojis
|
||||||
|
"""
|
||||||
|
return self.entity_colors.copy()
|
||||||
|
|
@ -0,0 +1,54 @@
|
||||||
|
from transformers import pipeline
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
from src.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
class SentimentAnalyzer:
|
||||||
|
"""Sentiment analyzer using transformers"""
|
||||||
|
|
||||||
|
def __init__(self, model_name: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize the sentiment-analysis pipeline
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name: Name of the model to use (optional)
|
||||||
|
"""
|
||||||
|
self.model_name = model_name or Config.get_model("sentiment")
|
||||||
|
print(f"Loading sentiment model: {self.model_name}")
|
||||||
|
self.pipeline = pipeline("sentiment-analysis", model=self.model_name)
|
||||||
|
print("Model loaded successfully!")
|
||||||
|
|
||||||
|
def analyze(self, text: str) -> Dict:
|
||||||
|
"""
|
||||||
|
Analyze the sentiment of a text
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Text to analyze
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with label and confidence score
|
||||||
|
"""
|
||||||
|
if not text.strip():
|
||||||
|
return {"error": "Empty text"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = self.pipeline(text)[0]
|
||||||
|
return {
|
||||||
|
"text": text,
|
||||||
|
"sentiment": result["label"],
|
||||||
|
"confidence": round(result["score"], 4)
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"Analysis error: {str(e)}"}
|
||||||
|
|
||||||
|
def analyze_batch(self, texts: List[str]) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Analyze the sentiment of multiple texts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
texts: List of texts to analyze
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of analysis results
|
||||||
|
"""
|
||||||
|
return [self.analyze(text) for text in texts]
|
||||||
|
|
@ -0,0 +1,59 @@
|
||||||
|
"""
|
||||||
|
Template for creating new pipelines
|
||||||
|
Copy this file and adapt it according to your needs
|
||||||
|
"""
|
||||||
|
from transformers import pipeline
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
|
|
||||||
|
class TemplatePipeline:
|
||||||
|
"""Template for a new pipeline"""
|
||||||
|
|
||||||
|
def __init__(self, model_name: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize the pipeline
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name: Name of the model to use (optional)
|
||||||
|
"""
|
||||||
|
self.model_name = model_name or "distilbert-base-uncased"
|
||||||
|
print(f"Loading model {self.model_name}...")
|
||||||
|
|
||||||
|
# Replace "text-classification" with your task
|
||||||
|
self.pipeline = pipeline("text-classification", model=self.model_name)
|
||||||
|
print("Model loaded successfully!")
|
||||||
|
|
||||||
|
def process(self, text: str) -> Dict:
|
||||||
|
"""
|
||||||
|
Process a text
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Text to process
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with results
|
||||||
|
"""
|
||||||
|
if not text.strip():
|
||||||
|
return {"error": "Empty text"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = self.pipeline(text)
|
||||||
|
return {
|
||||||
|
"text": text,
|
||||||
|
"result": result,
|
||||||
|
# Add other fields according to your needs
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"Processing error: {str(e)}"}
|
||||||
|
|
||||||
|
def process_batch(self, texts: List[str]) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Process multiple texts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
texts: List of texts to process
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of results
|
||||||
|
"""
|
||||||
|
return [self.process(text) for text in texts]
|
||||||
|
|
@ -0,0 +1,82 @@
|
||||||
|
from transformers import pipeline
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
from src.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
class TextGenerator:
|
||||||
|
"""Text generator using transformers"""
|
||||||
|
|
||||||
|
def __init__(self, model_name: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize the text-generation pipeline
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_name: Name of the model to use (optional)
|
||||||
|
"""
|
||||||
|
self.model_name = model_name or Config.get_model("textgen")
|
||||||
|
print(f"Loading text generation model: {self.model_name}")
|
||||||
|
self.pipeline = pipeline("text-generation", model=self.model_name)
|
||||||
|
print("Model loaded successfully!")
|
||||||
|
|
||||||
|
def generate(self, prompt: str, max_length: int = 100, num_return_sequences: int = 1,
|
||||||
|
temperature: float = 1.0, do_sample: bool = True) -> Dict:
|
||||||
|
"""
|
||||||
|
Generate text from a prompt
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompt: Input text prompt
|
||||||
|
max_length: Maximum length of generated text
|
||||||
|
num_return_sequences: Number of sequences to generate
|
||||||
|
temperature: Sampling temperature (higher = more random)
|
||||||
|
do_sample: Whether to use sampling
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with generated texts
|
||||||
|
"""
|
||||||
|
if not prompt.strip():
|
||||||
|
return {"error": "Empty prompt"}
|
||||||
|
|
||||||
|
try:
|
||||||
|
results = self.pipeline(
|
||||||
|
prompt,
|
||||||
|
max_length=max_length,
|
||||||
|
num_return_sequences=num_return_sequences,
|
||||||
|
temperature=temperature,
|
||||||
|
do_sample=do_sample,
|
||||||
|
pad_token_id=self.pipeline.tokenizer.eos_token_id
|
||||||
|
)
|
||||||
|
|
||||||
|
generations = [
|
||||||
|
{
|
||||||
|
"text": result["generated_text"],
|
||||||
|
"continuation": result["generated_text"][len(prompt):].strip()
|
||||||
|
}
|
||||||
|
for result in results
|
||||||
|
]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"prompt": prompt,
|
||||||
|
"parameters": {
|
||||||
|
"max_length": max_length,
|
||||||
|
"num_sequences": num_return_sequences,
|
||||||
|
"temperature": temperature,
|
||||||
|
"do_sample": do_sample
|
||||||
|
},
|
||||||
|
"generations": generations
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": f"Generation error: {str(e)}"}
|
||||||
|
|
||||||
|
def generate_batch(self, prompts: List[str], **kwargs) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Generate text for multiple prompts
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompts: List of input prompts
|
||||||
|
**kwargs: Generation parameters
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of generation results
|
||||||
|
"""
|
||||||
|
return [self.generate(prompt, **kwargs) for prompt in prompts]
|
||||||
Loading…
Reference in New Issue