Init commit

2025-10-11 13:26:06 +02:00 · 2025-10-11 13:26:06 +02:00 · 9b2a5497d9
commit 9b2a5497d9
25 changed files with 3343 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,79 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 *.egg-info/
 .installed.cfg
 *.egg
 # Virtual environments
 venv/
 ENV/
 env/
 .venv/
 .env/
 # PyInstaller
 *.manifest
 *.spec
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 # Jupyter Notebook
 .ipynb_checkpoints
 # pyenv
 .python-version
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # VS Code
 .vscode/
 # macOS
 .DS_Store
 # Logs
 *.log
 # dotenv
 .env
 .env.*
 # Local settings
 local_settings.py
 # System files
 Thumbs.db
 ehthumbs.db
 Desktop.ini
--- a/README.md
+++ b/README.md
@ -0,0 +1,314 @@
 # 🧠 AI Lab – Transformers CLI Playground
 > A **pedagogical and technical project** designed for AI practitioners and students to experiment with Hugging Face Transformers through an **interactive Command‑Line Interface (CLI)**.  
 > This playground provides ready‑to‑use NLP pipelines (Sentiment Analysis, Named Entity Recognition, Text Generation, Fill‑Mask, Moderation, etc.) in a modular, extensible, and educational codebase.
 ---
 ## 📚 Overview
 The **AI Lab – Transformers CLI Playground** allows you to explore multiple natural language processing tasks directly from the terminal.  
 Each task (e.g., sentiment, NER, text generation) is implemented as a **Command Module**, which interacts with a **Pipeline Module** built on top of the `transformers` library.
 The lab is intentionally structured to demonstrate **clean software design for ML codebases** — with strict separation between configuration, pipelines, CLI logic, and display formatting.
 ---
 ## 🗂️ Project Structure
 ```text
 src/
 ├── __init__.py
 ├── main.py                 # CLI entry point
 │
 ├── cli/
 │   ├── __init__.py
 │   ├── base.py             # CLICommand base class & interactive shell handler
 │   └── display.py          # Console formatting utilities (tables, colors, results)
 │
 ├── commands/               # User-facing commands wrapping pipeline logic
 │   ├── __init__.py
 │   ├── sentiment.py        # Sentiment analysis command
 │   ├── fillmask.py         # Masked token prediction command
 │   ├── textgen.py          # Text generation command
 │   ├── ner.py              # Named Entity Recognition command
 │   └── moderation.py       # Toxicity / content moderation command
 │
 ├── pipelines/              # Machine learning logic (Hugging Face Transformers)
 │   ├── __init__.py
 │   ├── template.py         # Blueprint for creating new pipelines
 │   ├── sentiment.py
 │   ├── fillmask.py
 │   ├── textgen.py
 │   ├── ner.py
 │   └── moderation.py
 │
 └── config/
    ├── __init__.py
    └── settings.py         # Global configuration (default models, parameters)
 ```
 ---
 ## ⚙️ Installation
 ### 🧾 Option 1 – Using Poetry (Recommended)
 > Poetry is used as the main dependency manager.
 ```bash
 # 1. Create and activate a new virtual environment
 poetry shell
 # 2. Install dependencies
 poetry install
 ```
 This will automatically install all dependencies declared in `pyproject.toml`, including **transformers** and **torch**.
 To run the CLI inside the Poetry environment:
 ```bash
 poetry run python src/main.py
 ```
 ---
 ### 📦 Option 2 – Using pip and requirements.txt
 If you prefer using `requirements.txt` manually:
 ```bash
 # 1. Create a virtual environment
 python -m venv .venv
 # 2. Activate it
 # Linux/macOS
 source .venv/bin/activate
 # Windows PowerShell
 .venv\Scripts\Activate.ps1
 # 3. Install dependencies
 pip install -r requirements.txt
 ```
 ---
 ## ▶️ Usage
 Once installed, launch the CLI with:
 ```bash
 python -m src.main
 # or, if using Poetry
 poetry run python src/main.py
 ```
 You’ll see an interactive menu listing the available commands:
 ```
 Welcome to AI Lab - Transformers CLI Playground
 Available commands:
  • sentiment     – Analyze the sentiment of a text
  • fillmask      – Predict masked words in a sentence
  • textgen       – Generate text from a prompt
  • ner           – Extract named entities from text
  • moderation    – Detect toxic or unsafe content
 ```
 ### Example Sessions
 #### 🔹 Sentiment Analysis
 ```text
 💬 Enter text: I absolutely love this project!
 → Sentiment: POSITIVE (score: 0.998)
 ```
 #### 🔹 Fill‑Mask
 ```text
 💬 Enter text: The capital of France is [MASK].
 → Predictions:
  1) Paris      score: 0.87
  2) Lyon       score: 0.04
  3) London     score: 0.02
 ```
 #### 🔹 Text Generation
 ```text
 💬 Prompt: Once upon a time
 → Output: Once upon a time there was a young AI learning to code...
 ```
 #### 🔹 NER (Named Entity Recognition)
 ```text
 💬 Enter text: Elon Musk founded SpaceX in California.
 → Entities:
  - Elon Musk  (PERSON)
  - SpaceX     (ORG)
  - California (LOC)
 ```
 #### 🔹 Moderation
 ```text
 💬 Enter text: I hate everything!
 → Result: FLAGGED (toxic content detected)
 ```
 ---
 ## 🧠 Architecture Overview
 The internal structure follows a clean **Command ↔ Pipeline ↔ Display** pattern:
 ```text
           ┌──────────────────────┐
           │     InteractiveCLI   │
           │ (src/cli/base.py)    │
           └──────────┬───────────┘
                      │
                      ▼
             ┌─────────────────┐
             │   Command Layer │  ← e.g. sentiment.py
             │ (user commands) │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │  Pipeline Layer │  ← e.g. pipelines/sentiment.py
             │ (ML logic)      │
             └───────┬─────────┘
                     │
                     ▼
             ┌─────────────────┐
             │ Display Layer   │  ← cli/display.py
             │ (format output) │
             └─────────────────┘
 ```
 ### Key Concepts
 | Layer | Description |
 |-------|--------------|
 | **CLI** | Manages user input/output, help menus, and navigation between commands. |
 | **Command** | Encapsulates a single user-facing operation (e.g., run sentiment). |
 | **Pipeline** | Wraps Hugging Face’s `transformers.pipeline()` to perform inference. |
 | **Display** | Handles clean console rendering (colored output, tables, JSON formatting). |
 | **Config** | Centralizes model names, limits, and global constants. |
 ---
 ## ⚙️ Configuration
 All configuration is centralized in `src/config/settings.py`.
 Example:
 ```python
 class Config:
    DEFAULT_MODELS = {
        "sentiment": "distilbert-base-uncased-finetuned-sst-2-english",
        "fillmask":  "bert-base-uncased",
        "textgen":   "gpt2",
        "ner":       "dslim/bert-base-NER",
        "moderation":"unitary/toxic-bert"
    }
    MAX_LENGTH = 512
    BATCH_SIZE = 8
 ```
 You can easily modify model names to experiment with different checkpoints.
 ---
 ## 🧩 Extending the Playground
 To create a new experiment (e.g., keyword extraction):
 1. **Duplicate** `src/pipelines/template.py` → `src/pipelines/keywords.py`  
   Implement the `run()` or `analyze()` logic using a new Hugging Face pipeline.
 2. **Create a Command** in `src/commands/keywords.py` to interact with users.
 3. **Register the command** inside `src/main.py`:
 ```python
 from src.commands.keywords import KeywordsCommand
 cli.register_command(KeywordsCommand())
 ```
 4. Optionally, add a model name in `Config.DEFAULT_MODELS`.
 ---
 ## 🧪 Testing
 You can use `pytest` for lightweight validation:
 ```bash
 pip install pytest
 pytest -q
 ```
 Recommended structure:
 ```
 tests/
 ├── test_sentiment.py
 ├── test_textgen.py
 └── ...
 ```
 ---
 ## 🧰 Troubleshooting
 | Issue | Cause / Solution |
 |-------|------------------|
 | **`transformers` not found** | Check virtual environment activation. |
 | **Torch fails to install** | Install CPU-only version from PyTorch index. |
 | **Models download slowly** | Hugging Face caches them after first run. |
 | **Unicode / accents broken** | Ensure terminal encoding is UTF‑8. |
 ---
 ## 🧭 Development Guidelines
 - Keep **Command** classes lightweight — no ML logic inside them.  
 - Reuse the **Pipeline Template** for new experiments.  
 - Format outputs consistently via the `DisplayFormatter`.  
 - Document all new models or commands in `README.md` and `settings.py`.
 ---
 ## 🧱 Roadmap
 - [ ] Add non-interactive CLI flags (`--text`, `--task`)
 - [ ] Add multilingual model options
 - [ ] Add automatic test coverage
 - [ ] Add logging and profiling utilities
 - [ ] Add export to JSON/CSV results
 ---
 ## 🪪 License
 You can include a standard open-source license such as **MIT** or **Apache 2.0** depending on your use case.
 ---
 ## 🤝 Contributing
 This repository is meant as an **educational sandbox** for experimenting with Transformers.  
 Pull requests are welcome for new models, better CLI UX, or educational improvements.
 ---
 ### ✨ Key Takeaways
 - Modular and pedagogical design for training environments  
 - Clean separation between **I/O**, **ML logic**, and **UX**  
 - Easily extensible architecture for adding custom pipelines  
 - Perfect sandbox for students, researchers, and developers to learn modern NLP tools
 ---
 > 🧩 Built for experimentation. Learn, break, and rebuild.
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,27 @@
 [project]
 name = "ai-lab"
 version = "0.1.0"
 description = "Lab for testing different uses of transformers"
 authors = [{ name = "Cyril", email = "decostanzicyril@gmail.com" }]
 [tool.poetry]
 name = "ai-lab"
 version = "0.1.0"
 description = "Lab for testing different uses of transformers"
 authors = ["Cyril"]
 packages = [{ include = "src" }]
 [tool.poetry.dependencies]
 python = ">=3.12,<3.14"
 torch = "^2.0.0"
 transformers = "^4.30.0"
 tokenizers = "^0.13.0"
 numpy = "^1.24.0"
 accelerate = "^0.20.0"
 [tool.poetry.scripts]
 ai-lab = "src.main:main"
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,4 @@
 torch>=2.0.0
 transformers>=4.30.0
 tokenizers>=0.13.0
 numpy>=1.24.0
--- a/src/init.py
+++ b/src/init.py
@ -0,0 +1,4 @@
 """
 AI Lab - Transformers Experimentation
 """
 __version__ = "0.1.0"
--- a/src/cli/init.py
+++ b/src/cli/init.py
@ -0,0 +1,7 @@
 """
 CLI utilities for AI Lab
 """
 from .base import CLICommand, InteractiveCLI
 from .display import DisplayFormatter
 __all__ = ['CLICommand', 'InteractiveCLI', 'DisplayFormatter']
--- a/src/cli/base.py
+++ b/src/cli/base.py
@ -0,0 +1,87 @@
 from abc import ABC, abstractmethod
 from typing import Dict, Any
 from src.config import Config
 class CLICommand(ABC):
    """Base class for CLI commands"""
    @property
    @abstractmethod
    def name(self) -> str:
        """Command name"""
        pass
    @property
    @abstractmethod
    def description(self) -> str:
        """Command description"""
        pass
    @abstractmethod
    def run(self) -> None:
        """Execute the command"""
        pass
 class InteractiveCLI:
    """Interactive CLI handler"""
    def __init__(self):
        self.commands: Dict[str, CLICommand] = {}
    def register_command(self, command: CLICommand):
        """Register a new command"""
        self.commands[command.name] = command
    def show_menu(self):
        """Display available commands"""
        print(Config.CLI_BANNER)
        print(Config.CLI_SEPARATOR)
        print("Available commands:")
        for name, cmd in self.commands.items():
            print(f"  📌 {name}: {cmd.description}")
        print("  📌 quit: Exit application")
        print("  📌 help: Show this help")
        print("-" * 50)
    def show_help(self):
        """Show detailed help"""
        print("\n📚 Detailed Help")
        print("-" * 30)
        print("Navigation:")
        print("  - Type a command name to execute it")
        print("  - Type 'back' in a command to return to menu")
        print("  - Type 'quit' or Ctrl+C to exit")
        print("\nAvailable commands:")
        for name, cmd in self.commands.items():
            print(f"  {name}: {cmd.description}")
    def run(self):
        """Run the interactive CLI"""
        self.show_menu()
        while True:
            try:
                choice = input("\n💬 Choose a command: ").strip().lower()
                if choice in ['quit', 'exit', 'q']:
                    print("👋 Goodbye!")
                    break
                if choice in ['help', 'h', '?']:
                    self.show_help()
                    continue
                if choice in self.commands:
                    print()  # Empty line for readability
                    self.commands[choice].run()
                    print()  # Empty line after command
                else:
                    print("❌ Unknown command. Type 'help' to see available commands.")
            except KeyboardInterrupt:
                print("\n👋 Stopping program")
                break
            except Exception as e:
                print(f"❌ Error: {e}")
--- a/src/cli/display.py
+++ b/src/cli/display.py
@ -0,0 +1,192 @@
 from typing import Dict, Any
 class DisplayFormatter:
    """Utility class for formatting display output"""
    @staticmethod
    def format_sentiment_result(result: Dict[str, Any]) -> str:
        """Format sentiment analysis result for display"""
        if "error" in result:
            return f"❌ {result['error']}"
        sentiment = result["sentiment"]
        confidence = result["confidence"]
        emoji = "😊" if sentiment == "POSITIVE" else "😞"
        return f"{emoji} Sentiment: {sentiment}\n📊 Confidence: {confidence:.2%}"
    @staticmethod
    def show_loading(message: str = "Analysis in progress..."):
        """Show loading message"""
        print(f"\n🔍 {message}")
    @staticmethod
    def show_warning(message: str):
        """Show warning message"""
        print(f"⚠️  {message}")
    @staticmethod
    def show_error(message: str):
        """Show error message"""
        print(f"❌ {message}")
    @staticmethod
    def show_success(message: str):
        """Show success message"""
        print(f"✅ {message}")
    @staticmethod
    def format_fillmask_result(result: Dict[str, Any]) -> str:
        """Format fill-mask prediction result for display"""
        if "error" in result:
            return f"❌ {result['error']}"
        output = []
        output.append(f"📝 Original: {result['original_text']}")
        output.append(f"🎭 Masks found: {result['masks_count']}")
        output.append("")
        if result['masks_count'] == 1:
            # Single mask
            output.append("🔮 Predictions:")
            for i, pred in enumerate(result['predictions'], 1):
                confidence_bar = "█" * int(pred['score'] * 10)
                output.append(f"  {i}. '{pred['token']}' ({pred['score']:.1%}) {confidence_bar}")
                output.append(f"     → {pred['sequence']}")
        else:
            # Multiple masks
            for mask_info in result['predictions']:
                output.append(f"🔮 Mask #{mask_info['mask_position']} predictions:")
                for i, pred in enumerate(mask_info['predictions'], 1):
                    confidence_bar = "█" * int(pred['score'] * 10)
                    output.append(f"  {i}. '{pred['token']}' ({pred['score']:.1%}) {confidence_bar}")
                output.append("")
        return "\n".join(output)
    @staticmethod
    def format_textgen_result(result: Dict[str, Any]) -> str:
        """Format text generation result for display"""
        if "error" in result:
            return f"❌ {result['error']}"
        output = []
        output.append(f"📝 Prompt: {result['prompt']}")
        output.append(f"⚙️  Parameters: max_length={result['parameters']['max_length']}, "
                     f"temperature={result['parameters']['temperature']}")
        output.append("-" * 50)
        for i, gen in enumerate(result['generations'], 1):
            if len(result['generations']) > 1:
                output.append(f"🎯 Generation {i}:")
            output.append(f"📄 Full text: {gen['text']}")
            if gen['continuation']:
                output.append(f"✨ Continuation: {gen['continuation']}")
            if i < len(result['generations']):
                output.append("-" * 30)
        return "\n".join(output)
    @staticmethod
    def format_moderation_result(result: Dict[str, Any]) -> str:
        """Format content moderation result for display"""
        if "error" in result:
            return f"❌ {result['error']}"
        output = []
        output.append(f"📝 Original: {result['original_text']}")
        if result['is_modified']:
            output.append(f"🛡️  Moderated: {result['moderated_text']}")
            output.append(f"⚠️  Status: Content modified ({result['words_replaced']} words replaced)")
            status_emoji = "🔴"
        else:
            output.append("✅ Status: Content approved (no modifications needed)")
            status_emoji = "🟢"
        # Toxicity score bar
        score = result['toxic_score']
        score_bar = "█" * int(score * 10)
        output.append(f"{status_emoji} Toxicity Score: {score:.1%} {score_bar}")
        return "\n".join(output)
    @staticmethod
    def format_ner_result(result: Dict[str, Any]) -> str:
        """Format NER result for display"""
        if "error" in result:
            return f"❌ {result['error']}"
        output = []
        output.append(f"📝 Original: {result['original_text']}")
        output.append(f"✨ Highlighted: {result['highlighted_text']}")
        output.append(f"🎯 Found {result['total_entities']} entities (threshold: {result['confidence_threshold']:.2f})")
        if result['entities']:
            output.append("\n📋 Detected Entities:")
            for entity in result['entities']:
                confidence_bar = "█" * int(entity['confidence'] * 10)
                output.append(f"  {entity['emoji']} {entity['text']} → {entity['label']} "
                             f"({entity['confidence']:.1%}) {confidence_bar}")
        if result['entity_stats']:
            output.append("\n📊 Entity Statistics:")
            for entity_type, stats in result['entity_stats'].items():
                unique_entities = list(set(stats['entities']))
                emoji = result['entities'][0]['emoji'] if result['entities'] else "🏷️"
                for ent in result['entities']:
                    if ent['label'] == entity_type:
                        emoji = ent['emoji']
                        break
                output.append(f"  {emoji} {entity_type}: {stats['count']} occurrences")
                if len(unique_entities) <= 3:
                    output.append(f"     → {', '.join(unique_entities)}")
                else:
                    output.append(f"     → {', '.join(unique_entities[:3])}... (+{len(unique_entities)-3} more)")
        return "\n".join(output)
    @staticmethod
    def format_ner_analysis(result: Dict[str, Any]) -> str:
        """Format comprehensive NER document analysis"""
        if "error" in result:
            return f"❌ {result['error']}"
        output = []
        output.append("📊 Document Analysis Results")
        output.append("=" * 50)
        # Document statistics
        stats = result['document_stats']
        output.append(f"📄 Document: {stats['word_count']} words, {stats['char_count']} characters")
        output.append(f"📝 Structure: ~{stats['sentence_count']} sentences")
        output.append(f"🎯 Entity Density: {stats['entity_density']:.2%} (entities per word)")
        # Most common entity type
        if 'most_common_entity_type' in result:
            common = result['most_common_entity_type']
            output.append(f"🏆 Most Common: {common['emoji']} {common['type']} ({common['count']} occurrences)")
        output.append(f"\n✨ Highlighted Text:")
        output.append(result['highlighted_text'])
        if result['entity_stats']:
            output.append(f"\n📈 Detailed Statistics:")
            for entity_type, stats in result['entity_stats'].items():
                unique_entities = list(set(stats['entities']))
                emoji = "🏷️"
                for ent in result['entities']:
                    if ent['label'] == entity_type:
                        emoji = ent['emoji']
                        break
                output.append(f"\n{emoji} {entity_type} ({stats['count']} total):")
                for entity in unique_entities:
                    count = stats['entities'].count(entity)
                    output.append(f"  • {entity} ({count}x)")
        return "\n".join(output)
--- a/src/commands/init.py
+++ b/src/commands/init.py
@ -0,0 +1,10 @@
 """
 AI Lab commands
 """
 from .sentiment import SentimentCommand
 from .fillmask import FillMaskCommand
 from .textgen import TextGenCommand
 from .moderation import ModerationCommand
 from .ner import NERCommand
 __all__ = ['SentimentCommand', 'FillMaskCommand', 'TextGenCommand', 'ModerationCommand', 'NERCommand']
--- a/src/commands/fillmask.py
+++ b/src/commands/fillmask.py
@ -0,0 +1,84 @@
 from src.cli.base import CLICommand
 from src.cli.display import DisplayFormatter
 from src.pipelines.fillmask import FillMaskAnalyzer
 class FillMaskCommand(CLICommand):
    """Interactive fill-mask prediction command"""
    def __init__(self):
        self.analyzer = None
    @property
    def name(self) -> str:
        return "fillmask"
    @property
    def description(self) -> str:
        return "Interactive fill-mask token prediction"
    def _initialize_analyzer(self):
        """Lazy initialization of the analyzer"""
        if self.analyzer is None:
            print("🔄 Loading fill-mask model...")
            self.analyzer = FillMaskAnalyzer()
            DisplayFormatter.show_success("Model loaded!")
    def _show_instructions(self):
        """Show usage instructions"""
        print("\n📝 Fill-Mask Prediction")
        print("Replace words with [MASK] token and get predictions")
        print("\nExamples:")
        print("  - The weather today is [MASK]")
        print("  - I love to [MASK] music")
        print("  - Paris is the capital of [MASK]")
        print("\nType 'back' to return to main menu")
        print("Type 'help' to see these instructions again")
        print("-" * 50)
    def _get_top_k(self) -> int:
        """Get number of predictions from user"""
        while True:
            try:
                top_k_input = input("📊 Number of predictions (1-10, default=5): ").strip()
                if not top_k_input:
                    return 5
                top_k = int(top_k_input)
                if 1 <= top_k <= 10:
                    return top_k
                else:
                    DisplayFormatter.show_warning("Please enter a number between 1 and 10")
            except ValueError:
                DisplayFormatter.show_warning("Please enter a valid number")
    def run(self):
        """Run interactive fill-mask prediction"""
        self._initialize_analyzer()
        self._show_instructions()
        while True:
            text = input("\n💬 Enter text with [MASK]: ").strip()
            if text.lower() in ['back', 'return']:
                break
            if text.lower() == 'help':
                self._show_instructions()
                continue
            if not text:
                DisplayFormatter.show_warning("Please enter some text")
                continue
            if "[MASK]" not in text:
                DisplayFormatter.show_warning("Text must contain [MASK] token")
                continue
            # Get number of predictions
            top_k = self._get_top_k()
            DisplayFormatter.show_loading("Predicting tokens...")
            result = self.analyzer.predict(text, top_k=top_k)
            formatted_result = DisplayFormatter.format_fillmask_result(result)
            print(formatted_result)
--- a/src/commands/moderation.py
+++ b/src/commands/moderation.py
@ -0,0 +1,73 @@
 from src.cli.base import CLICommand
 from src.cli.display import DisplayFormatter
 from src.pipelines.moderation import ContentModerator
 class ModerationCommand(CLICommand):
    """Interactive content moderation command"""
    def __init__(self):
        self.moderator = None
    @property
    def name(self) -> str:
        return "moderation"
    @property
    def description(self) -> str:
        return "Content moderation and filtering"
    def _initialize_moderator(self):
        """Lazy initialization of the moderator"""
        if self.moderator is None:
            print("🔄 Loading content moderation model...")
            self.moderator = ContentModerator()
            DisplayFormatter.show_success("Moderation model loaded!")
    def run(self):
        """Run interactive content moderation"""
        self._initialize_moderator()
        print("\n🛡️  Content Moderation")
        print("Type 'back' to return to main menu")
        print("Type 'settings' to adjust moderation sensitivity")
        print("-" * 40)
        while True:
            text = input("\n📝 Enter text to moderate: ").strip()
            if text.lower() in ['back', 'return']:
                break
            if text.lower() == 'settings':
                self._show_settings()
                continue
            if not text:
                DisplayFormatter.show_warning("Please enter some text")
                continue
            DisplayFormatter.show_loading("Analyzing content...")
            result = self.moderator.moderate(text)
            formatted_result = DisplayFormatter.format_moderation_result(result)
            print(formatted_result)
    def _show_settings(self):
        """Show and allow modification of moderation settings"""
        print(f"\n⚙️  Current Settings:")
        print(f"Toxicity threshold: {self.moderator.toxicity_threshold:.2f}")
        print("\nOptions:")
        print("1. Change threshold (0.0 = very strict, 1.0 = very permissive)")
        print("2. Back to moderation")
        choice = input("\nChoose option (1-2): ").strip()
        if choice == "1":
            try:
                new_threshold = float(input("Enter new threshold (0.0-1.0): "))
                self.moderator.set_threshold(new_threshold)
                DisplayFormatter.show_success(f"Threshold set to {new_threshold:.2f}")
            except ValueError:
                DisplayFormatter.show_error("Invalid threshold value")
        elif choice != "2":
            DisplayFormatter.show_warning("Invalid option")
--- a/src/commands/ner.py
+++ b/src/commands/ner.py
@ -0,0 +1,137 @@
 from src.cli.base import CLICommand
 from src.cli.display import DisplayFormatter
 from src.pipelines.ner import NamedEntityRecognizer
 class NERCommand(CLICommand):
    """Interactive Named Entity Recognition command"""
    def __init__(self):
        self.recognizer = None
        self.confidence_threshold = 0.9
    @property
    def name(self) -> str:
        return "ner"
    @property
    def description(self) -> str:
        return "Named Entity Recognition - Extract people, places, organizations"
    def _initialize_recognizer(self):
        """Lazy initialization of the recognizer"""
        if self.recognizer is None:
            print("🔄 Loading NER model...")
            self.recognizer = NamedEntityRecognizer()
            DisplayFormatter.show_success("NER model loaded!")
    def _show_instructions(self):
        """Show usage instructions and examples"""
        print("\n🎯 Named Entity Recognition")
        print("Extract and classify entities like people, organizations, locations, etc.")
        print("\n📝 Examples to try:")
        print("  - Apple Inc. was founded by Steve Jobs in Cupertino, California.")
        print("  - Barack Obama visited Paris in 2015 to meet Emmanuel Macron.")
        print("  - Microsoft acquired GitHub for $7.5 billion in June 2018.")
        print("\n🎛️  Commands:")
        print("  'back' - Return to main menu")
        print("  'help' - Show these instructions")
        print("  'settings' - Adjust confidence threshold")
        print("  'types' - Show entity types")
        print("  'analyze' - Detailed document analysis mode")
        print("-" * 60)
    def _show_entity_types(self):
        """Show available entity types"""
        entity_types = self.recognizer.get_entity_types()
        print("\n🏷️  Entity Types:")
        type_descriptions = {
            "PER": "Person names",
            "ORG": "Organizations, companies",
            "LOC": "Locations, places",
            "MISC": "Miscellaneous entities",
            "DATE": "Dates and time periods",
            "TIME": "Specific times",
            "MONEY": "Monetary amounts",
            "PERCENT": "Percentages"
        }
        for entity_type, emoji in entity_types.items():
            description = type_descriptions.get(entity_type, "Other entities")
            print(f"  {emoji} {entity_type}: {description}")
    def _adjust_settings(self):
        """Allow user to adjust confidence threshold"""
        print(f"\n⚙️  Current confidence threshold: {self.confidence_threshold:.2f}")
        print("Lower values = more entities detected (but less accurate)")
        print("Higher values = fewer entities detected (but more accurate)")
        try:
            new_threshold = input(f"Enter new threshold (0.1-1.0, current: {self.confidence_threshold}): ").strip()
            if new_threshold:
                threshold = float(new_threshold)
                if 0.1 <= threshold <= 1.0:
                    self.confidence_threshold = threshold
                    DisplayFormatter.show_success(f"Threshold set to {threshold:.2f}")
                else:
                    DisplayFormatter.show_warning("Threshold must be between 0.1 and 1.0")
        except ValueError:
            DisplayFormatter.show_error("Invalid threshold value")
    def _analyze_mode(self):
        """Document analysis mode with detailed statistics"""
        print("\n📊 Document Analysis Mode")
        print("Enter longer text for comprehensive entity analysis")
        print("Type 'done' when finished")
        print("-" * 40)
        lines = []
        while True:
            line = input("📝 ").strip()
            if line.lower() == 'done':
                break
            if line:
                lines.append(line)
        if not lines:
            DisplayFormatter.show_warning("No text entered")
            return
        document = " ".join(lines)
        DisplayFormatter.show_loading("Analyzing document...")
        result = self.recognizer.analyze_document(document, self.confidence_threshold)
        formatted_result = DisplayFormatter.format_ner_analysis(result)
        print(formatted_result)
    def run(self):
        """Run interactive NER"""
        self._initialize_recognizer()
        self._show_instructions()
        while True:
            text = input("\n💬 Enter text to analyze: ").strip()
            if text.lower() == 'back':
                break
            elif text.lower() == 'help':
                self._show_instructions()
                continue
            elif text.lower() == 'settings':
                self._adjust_settings()
                continue
            elif text.lower() == 'types':
                self._show_entity_types()
                continue
            elif text.lower() == 'analyze':
                self._analyze_mode()
                continue
            if not text:
                DisplayFormatter.show_warning("Please enter some text")
                continue
            DisplayFormatter.show_loading("Extracting entities...")
            result = self.recognizer.recognize(text, self.confidence_threshold)
            formatted_result = DisplayFormatter.format_ner_result(result)
            print(formatted_result)
--- a/src/commands/sentiment.py
+++ b/src/commands/sentiment.py
@ -0,0 +1,48 @@
 from src.cli.base import CLICommand
 from src.cli.display import DisplayFormatter
 from src.pipelines.sentiment import SentimentAnalyzer
 class SentimentCommand(CLICommand):
    """Interactive sentiment analysis command"""
    def __init__(self):
        self.analyzer = None
    @property
    def name(self) -> str:
        return "sentiment"
    @property
    def description(self) -> str:
        return "Interactive sentiment analysis"
    def _initialize_analyzer(self):
        """Lazy initialization of the analyzer"""
        if self.analyzer is None:
            print("🔄 Loading sentiment model...")
            self.analyzer = SentimentAnalyzer()
            DisplayFormatter.show_success("Model loaded!")
    def run(self):
        """Run interactive sentiment analysis"""
        self._initialize_analyzer()
        print("\n📝 Sentiment Analysis")
        print("Type 'back' to return to main menu")
        print("-" * 30)
        while True:
            text = input("\n💬 Enter your text: ").strip()
            if text.lower() in ['back', 'return']:
                break
            if not text:
                DisplayFormatter.show_warning("Please enter some text")
                continue
            DisplayFormatter.show_loading()
            result = self.analyzer.analyze(text)
            formatted_result = DisplayFormatter.format_sentiment_result(result)
            print(formatted_result)
--- a/src/commands/textgen.py
+++ b/src/commands/textgen.py
@ -0,0 +1,95 @@
 from src.cli.base import CLICommand
 from src.cli.display import DisplayFormatter
 from src.pipelines.textgen import TextGenerator
 class TextGenCommand(CLICommand):
    """Interactive text generation command"""
    def __init__(self):
        self.generator = None
        self.default_params = {
            'max_length': 100,
            'num_return_sequences': 1,
            'temperature': 1.0,
            'do_sample': True
        }
    @property
    def name(self) -> str:
        return "textgen"
    @property
    def description(self) -> str:
        return "Interactive text generation"
    def _initialize_generator(self):
        """Lazy initialization of the generator"""
        if self.generator is None:
            print("🔄 Loading text generation model...")
            self.generator = TextGenerator()
            DisplayFormatter.show_success("Model loaded!")
    def _show_parameters(self):
        """Show current generation parameters"""
        print("\n⚙️  Current parameters:")
        for key, value in self.default_params.items():
            print(f"  {key}: {value}")
    def _update_parameters(self):
        """Allow user to update generation parameters"""
        print("\n🔧 Update parameters (press Enter to keep current value):")
        try:
            max_length = input(f"Max length ({self.default_params['max_length']}): ").strip()
            if max_length:
                self.default_params['max_length'] = int(max_length)
            num_sequences = input(f"Number of sequences ({self.default_params['num_return_sequences']}): ").strip()
            if num_sequences:
                self.default_params['num_return_sequences'] = int(num_sequences)
            temperature = input(f"Temperature ({self.default_params['temperature']}): ").strip()
            if temperature:
                self.default_params['temperature'] = float(temperature)
            do_sample = input(f"Use sampling ({self.default_params['do_sample']}): ").strip().lower()
            if do_sample in ['true', 'false']:
                self.default_params['do_sample'] = do_sample == 'true'
            DisplayFormatter.show_success("Parameters updated!")
        except ValueError as e:
            DisplayFormatter.show_error(f"Invalid parameter value: {e}")
    def run(self):
        """Run interactive text generation"""
        self._initialize_generator()
        print("\n📝 Text Generation")
        print("Commands:")
        print("  'back' - Return to main menu")
        print("  'params' - Show current parameters")
        print("  'config' - Update parameters")
        print("-" * 40)
        while True:
            prompt = input("\n💬 Enter your prompt: ").strip()
            if prompt.lower() == 'back':
                break
            elif prompt.lower() == 'params':
                self._show_parameters()
                continue
            elif prompt.lower() == 'config':
                self._update_parameters()
                continue
            if not prompt:
                DisplayFormatter.show_warning("Please enter a prompt")
                continue
            DisplayFormatter.show_loading("Generating text...")
            result = self.generator.generate(prompt, **self.default_params)
            formatted_result = DisplayFormatter.format_textgen_result(result)
            print(formatted_result)
--- a/src/config/init.py
+++ b/src/config/init.py
@ -0,0 +1,6 @@
 """
 Project configuration
 """
 from .settings import Config
 __all__ = ['Config']
--- a/src/config/settings.py
+++ b/src/config/settings.py
@ -0,0 +1,40 @@
 """
 Global project configuration
 """
 from pathlib import Path
 from typing import Dict, Any
 class Config:
    """Global application configuration"""
    # Paths
    PROJECT_ROOT = Path(__file__).parent.parent.parent
    SRC_DIR = PROJECT_ROOT / "src"
    # Default models
    DEFAULT_MODELS = {
        "sentiment": "cardiffnlp/twitter-roberta-base-sentiment-latest",
        "fillmask": "distilbert-base-uncased",
        "textgen": "gpt2",
        "moderation": "unitary/toxic-bert",
        "ner": "dbmdz/bert-large-cased-finetuned-conll03-english",
    }
    # Interface
    CLI_BANNER = "🤖 AI Lab - Transformers Experimentation"
    CLI_SEPARATOR = "=" * 50
    # Performance
    MAX_BATCH_SIZE = 32
    DEFAULT_MAX_LENGTH = 512
    @classmethod
    def get_model(cls, pipeline_name: str) -> str:
        """Get default model for a pipeline"""
        return cls.DEFAULT_MODELS.get(pipeline_name, "")
    @classmethod
    def get_all_models(cls) -> Dict[str, str]:
        """Get all configured models"""
        return cls.DEFAULT_MODELS.copy()
--- a/src/main.py
+++ b/src/main.py
@ -0,0 +1,38 @@
 #!/usr/bin/env python3
 """
 CLI entry point for AI Lab
 """
 import sys
 from pathlib import Path
 # Add parent directory to PYTHONPATH
 sys.path.insert(0, str(Path(__file__).parent.parent))
 from src.cli import InteractiveCLI
 from src.commands import SentimentCommand, FillMaskCommand, TextGenCommand, ModerationCommand, NERCommand
 def main():
    """Main CLI function"""
    try:
        # Create CLI interface
        cli = InteractiveCLI()
        # Register available commands
        cli.register_command(SentimentCommand())
        cli.register_command(FillMaskCommand())
        cli.register_command(TextGenCommand())
        cli.register_command(ModerationCommand())
        cli.register_command(NERCommand())
        # Launch interactive interface
        cli.run()
    except KeyboardInterrupt:
        print("\n👋 Stopping program")
    except Exception as e:
        print(f"❌ Error: {e}")
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/src/pipelines/init.py
+++ b/src/pipelines/init.py
@ -0,0 +1,11 @@
 """
 Experimentation pipelines with transformers
 """
 from .sentiment import SentimentAnalyzer
 from .fillmask import FillMaskAnalyzer
 from .textgen import TextGenerator
 from .moderation import ContentModerator
 from .ner import NamedEntityRecognizer
 from .template import TemplatePipeline
 __all__ = ['SentimentAnalyzer', 'FillMaskAnalyzer', 'TextGenerator', 'ContentModerator', 'NamedEntityRecognizer', 'TemplatePipeline']
--- a/src/pipelines/fillmask.py
+++ b/src/pipelines/fillmask.py
@ -0,0 +1,95 @@
 from transformers import pipeline
 from typing import Dict, List, Optional
 from src.config import Config
 class FillMaskAnalyzer:
    """Fill-mask analyzer using transformers"""
    def __init__(self, model_name: Optional[str] = None):
        """
        Initialize the fill-mask pipeline
        Args:
            model_name: Name of the model to use (optional)
        """
        self.model_name = model_name or Config.get_model("fillmask")
        print(f"Loading fill-mask model: {self.model_name}")
        self.pipeline = pipeline("fill-mask", model=self.model_name)
        print("Model loaded successfully!")
    def predict(self, text: str, top_k: int = 5) -> Dict:
        """
        Predict masked tokens in text
        Args:
            text: Text with [MASK] token(s) to predict
            top_k: Number of top predictions to return
        Returns:
            Dictionary with predictions and scores
        """
        if not text.strip():
            return {"error": "Empty text"}
        if "[MASK]" not in text:
            return {"error": "Text must contain [MASK] token"}
        try:
            results = self.pipeline(text, top_k=top_k)
            # Handle single mask vs multiple masks
            if isinstance(results, list) and isinstance(results[0], list):
                # Multiple masks
                predictions = []
                for i, mask_results in enumerate(results):
                    mask_predictions = [
                        {
                            "token": pred["token_str"],
                            "score": round(pred["score"], 4),
                            "sequence": pred["sequence"]
                        }
                        for pred in mask_results
                    ]
                    predictions.append({
                        "mask_position": i + 1,
                        "predictions": mask_predictions
                    })
                return {
                    "original_text": text,
                    "masks_count": len(results),
                    "predictions": predictions
                }
            else:
                # Single mask
                predictions = [
                    {
                        "token": pred["token_str"],
                        "score": round(pred["score"], 4),
                        "sequence": pred["sequence"]
                    }
                    for pred in results
                ]
                return {
                    "original_text": text,
                    "masks_count": 1,
                    "predictions": predictions
                }
        except Exception as e:
            return {"error": f"Prediction error: {str(e)}"}
    def predict_batch(self, texts: List[str], top_k: int = 5) -> List[Dict]:
        """
        Predict masked tokens for multiple texts
        Args:
            texts: List of texts with [MASK] tokens
            top_k: Number of top predictions to return
        Returns:
            List of prediction results
        """
        return [self.predict(text, top_k) for text in texts]
--- a/src/pipelines/moderation.py
+++ b/src/pipelines/moderation.py
@ -0,0 +1,174 @@
 from transformers import pipeline
 from typing import Dict, List, Optional
 import re
 from src.config import Config
 class ContentModerator:
    """Content moderator that detects and replaces inappropriate content"""
    def __init__(self, model_name: Optional[str] = None):
        """
        Initialize the content moderation pipeline
        Args:
            model_name: Name of the model to use (optional)
        """
        self.model_name = model_name or Config.get_model("moderation")
        print(f"Loading moderation model: {self.model_name}")
        self.classifier = pipeline("text-classification", model=self.model_name)
        print("Moderation model loaded successfully!")
        # Threshold for considering content as toxic
        self.toxicity_threshold = 0.5
    def moderate(self, text: str, replacement: str = "***") -> Dict:
        """
        Moderate content by detecting and replacing inappropriate words
        Args:
            text: Text to moderate
            replacement: String to replace inappropriate content with
        Returns:
            Dictionary with original text, moderated text, and detection info
        """
        if not text.strip():
            return {"error": "Empty text"}
        try:
            # First, check overall toxicity
            result = self.classifier(text)
            # Handle different model output formats
            if isinstance(result, list):
                predictions = result
            else:
                predictions = [result]
            # Find toxicity score
            toxic_score = 0.0
            is_toxic = False
            for pred in predictions:
                label = pred["label"].upper()
                score = pred["score"]
                # Check different possible toxic labels
                if label in ["TOXIC", "TOXICITY", "HARMFUL", "1"]:
                    toxic_score = max(toxic_score, score)
                    if score > self.toxicity_threshold:
                        is_toxic = True
                elif label in ["NOT_TOXIC", "CLEAN", "0"]:
                    # For models where high score means NOT toxic
                    toxic_score = max(toxic_score, 1.0 - score)
                    if (1.0 - score) > self.toxicity_threshold:
                        is_toxic = True
            if not is_toxic:
                return {
                    "original_text": text,
                    "moderated_text": text,
                    "is_modified": False,
                    "toxic_score": toxic_score,
                    "words_replaced": 0
                }
            # If toxic, analyze word by word to find problematic parts
            moderated_text, words_replaced = self._moderate_by_words(text, replacement)
            return {
                "original_text": text,
                "moderated_text": moderated_text,
                "is_modified": True,
                "toxic_score": toxic_score,
                "words_replaced": words_replaced
            }
        except Exception as e:
            return {"error": f"Moderation error: {str(e)}"}
    def _moderate_by_words(self, text: str, replacement: str) -> tuple[str, int]:
        """
        Moderate text by analyzing individual words and phrases
        Args:
            text: Original text
            replacement: Replacement string
        Returns:
            Tuple of (moderated_text, words_replaced_count)
        """
        words = text.split()
        moderated_words = []
        words_replaced = 0
        # Check individual words
        for word in words:
            # Clean word for analysis (remove punctuation)
            clean_word = re.sub(r'[^\w]', '', word)
            if not clean_word:
                moderated_words.append(word)
                continue
            try:
                word_result = self.classifier(clean_word)
                # Handle different model output formats
                if isinstance(word_result, list):
                    predictions = word_result
                else:
                    predictions = [word_result]
                is_word_toxic = False
                for pred in predictions:
                    label = pred["label"].upper()
                    score = pred["score"]
                    if label in ["TOXIC", "TOXICITY", "HARMFUL", "1"]:
                        if score > self.toxicity_threshold:
                            is_word_toxic = True
                            break
                    elif label in ["NOT_TOXIC", "CLEAN", "0"]:
                        if (1.0 - score) > self.toxicity_threshold:
                            is_word_toxic = True
                            break
                if is_word_toxic:
                    # Replace the clean part with asterisks, keep punctuation
                    moderated_word = re.sub(r'\w+', replacement, word)
                    moderated_words.append(moderated_word)
                    words_replaced += 1
                else:
                    moderated_words.append(word)
            except:
                # If analysis fails for a word, keep it as is
                moderated_words.append(word)
        return " ".join(moderated_words), words_replaced
    def moderate_batch(self, texts: List[str], replacement: str = "***") -> List[Dict]:
        """
        Moderate multiple texts
        Args:
            texts: List of texts to moderate
            replacement: String to replace inappropriate content with
        Returns:
            List of moderation results
        """
        return [self.moderate(text, replacement) for text in texts]
    def set_threshold(self, threshold: float):
        """
        Set the toxicity threshold
        Args:
            threshold: Threshold between 0 and 1
        """
        if 0 <= threshold <= 1:
            self.toxicity_threshold = threshold
        else:
            raise ValueError("Threshold must be between 0 and 1")
--- a/src/pipelines/ner.py
+++ b/src/pipelines/ner.py
@ -0,0 +1,179 @@
 from transformers import pipeline
 from typing import Dict, List, Optional, Tuple
 from src.config import Config
 class NamedEntityRecognizer:
    """Named Entity Recognition using transformers"""
    def __init__(self, model_name: Optional[str] = None):
        """
        Initialize the NER pipeline
        Args:
            model_name: Name of the model to use (optional)
        """
        self.model_name = model_name or Config.get_model("ner")
        print(f"Loading NER model: {self.model_name}")
        self.pipeline = pipeline("ner", model=self.model_name, aggregation_strategy="simple")
        print("NER model loaded successfully!")
        # Entity type mappings for better display
        self.entity_colors = {
            "PER": "👤",  # Person
            "ORG": "🏢",  # Organization
            "LOC": "📍",  # Location
            "MISC": "🏷️", # Miscellaneous
            "DATE": "📅", # Date
            "TIME": "⏰", # Time
            "MONEY": "💰", # Money
            "PERCENT": "📊", # Percentage
        }
    def recognize(self, text: str, confidence_threshold: float = 0.9) -> Dict:
        """
        Recognize named entities in text
        Args:
            text: Text to analyze
            confidence_threshold: Minimum confidence score for entities
        Returns:
            Dictionary with entities and their information
        """
        if not text.strip():
            return {"error": "Empty text"}
        try:
            entities = self.pipeline(text)
            # Filter by confidence and process entities
            filtered_entities = []
            entity_stats = {}
            for entity in entities:
                if entity["score"] >= confidence_threshold:
                    entity_type = entity["entity_group"]
                    processed_entity = {
                        "text": entity["word"],
                        "label": entity_type,
                        "confidence": round(entity["score"], 4),
                        "start": entity["start"],
                        "end": entity["end"],
                        "emoji": self.entity_colors.get(entity_type, "🏷️")
                    }
                    filtered_entities.append(processed_entity)
                    # Update statistics
                    if entity_type not in entity_stats:
                        entity_stats[entity_type] = {"count": 0, "entities": []}
                    entity_stats[entity_type]["count"] += 1
                    entity_stats[entity_type]["entities"].append(entity["word"])
            # Create highlighted text
            highlighted_text = self._highlight_entities(text, filtered_entities)
            return {
                "original_text": text,
                "highlighted_text": highlighted_text,
                "entities": filtered_entities,
                "entity_stats": entity_stats,
                "total_entities": len(filtered_entities),
                "confidence_threshold": confidence_threshold
            }
        except Exception as e:
            return {"error": f"NER processing error: {str(e)}"}
    def _highlight_entities(self, text: str, entities: List[Dict]) -> str:
        """
        Create highlighted version of text with entity markers
        Args:
            text: Original text
            entities: List of detected entities
        Returns:
            Text with highlighted entities
        """
        if not entities:
            return text
        # Sort entities by start position (reverse order for replacement)
        sorted_entities = sorted(entities, key=lambda x: x["start"], reverse=True)
        highlighted = text
        for entity in sorted_entities:
            start, end = entity["start"], entity["end"]
            entity_text = entity["text"]
            emoji = entity["emoji"]
            label = entity["label"]
            confidence = entity["confidence"]
            # Create highlighted version
            highlight = f"{emoji}[{entity_text}]({label}:{confidence:.2f})"
            highlighted = highlighted[:start] + highlight + highlighted[end:]
        return highlighted
    def analyze_document(self, text: str, confidence_threshold: float = 0.9) -> Dict:
        """
        Perform comprehensive document analysis with entity extraction
        Args:
            text: Document text to analyze
            confidence_threshold: Minimum confidence for entities
        Returns:
            Comprehensive analysis results
        """
        result = self.recognize(text, confidence_threshold)
        if "error" in result:
            return result
        # Additional analysis
        analysis = {
            **result,
            "document_stats": {
                "word_count": len(text.split()),
                "char_count": len(text),
                "sentence_count": len([s for s in text.split('.') if s.strip()]),
                "entity_density": len(result["entities"]) / len(text.split()) if text.split() else 0
            }
        }
        # Find most common entity types
        if result["entity_stats"]:
            most_common_type = max(result["entity_stats"].items(), key=lambda x: x[1]["count"])
            analysis["most_common_entity_type"] = {
                "type": most_common_type[0],
                "count": most_common_type[1]["count"],
                "emoji": self.entity_colors.get(most_common_type[0], "🏷️")
            }
        return analysis
    def recognize_batch(self, texts: List[str], confidence_threshold: float = 0.9) -> List[Dict]:
        """
        Recognize entities in multiple texts
        Args:
            texts: List of texts to analyze
            confidence_threshold: Minimum confidence for entities
        Returns:
            List of NER results
        """
        return [self.recognize(text, confidence_threshold) for text in texts]
    def get_entity_types(self) -> Dict[str, str]:
        """
        Get available entity types with their emojis
        Returns:
            Dictionary mapping entity types to emojis
        """
        return self.entity_colors.copy()
--- a/src/pipelines/sentiment.py
+++ b/src/pipelines/sentiment.py
@ -0,0 +1,54 @@
 from transformers import pipeline
 from typing import Dict, List, Optional
 from src.config import Config
 class SentimentAnalyzer:
    """Sentiment analyzer using transformers"""
    def __init__(self, model_name: Optional[str] = None):
        """
        Initialize the sentiment-analysis pipeline
        Args:
            model_name: Name of the model to use (optional)
        """
        self.model_name = model_name or Config.get_model("sentiment")
        print(f"Loading sentiment model: {self.model_name}")
        self.pipeline = pipeline("sentiment-analysis", model=self.model_name)
        print("Model loaded successfully!")
    def analyze(self, text: str) -> Dict:
        """
        Analyze the sentiment of a text
        Args:
            text: Text to analyze
        Returns:
            Dictionary with label and confidence score
        """
        if not text.strip():
            return {"error": "Empty text"}
        try:
            result = self.pipeline(text)[0]
            return {
                "text": text,
                "sentiment": result["label"],
                "confidence": round(result["score"], 4)
            }
        except Exception as e:
            return {"error": f"Analysis error: {str(e)}"}
    def analyze_batch(self, texts: List[str]) -> List[Dict]:
        """
        Analyze the sentiment of multiple texts
        Args:
            texts: List of texts to analyze
        Returns:
            List of analysis results
        """
        return [self.analyze(text) for text in texts]
--- a/src/pipelines/template.py
+++ b/src/pipelines/template.py
@ -0,0 +1,59 @@
 """
 Template for creating new pipelines
 Copy this file and adapt it according to your needs
 """
 from transformers import pipeline
 from typing import Dict, List, Optional
 class TemplatePipeline:
    """Template for a new pipeline"""
    def __init__(self, model_name: Optional[str] = None):
        """
        Initialize the pipeline
        Args:
            model_name: Name of the model to use (optional)
        """
        self.model_name = model_name or "distilbert-base-uncased"
        print(f"Loading model {self.model_name}...")
        # Replace "text-classification" with your task
        self.pipeline = pipeline("text-classification", model=self.model_name)
        print("Model loaded successfully!")
    def process(self, text: str) -> Dict:
        """
        Process a text
        Args:
            text: Text to process
        Returns:
            Dictionary with results
        """
        if not text.strip():
            return {"error": "Empty text"}
        try:
            result = self.pipeline(text)
            return {
                "text": text,
                "result": result,
                # Add other fields according to your needs
            }
        except Exception as e:
            return {"error": f"Processing error: {str(e)}"}
    def process_batch(self, texts: List[str]) -> List[Dict]:
        """
        Process multiple texts
        Args:
            texts: List of texts to process
        Returns:
            List of results
        """
        return [self.process(text) for text in texts]
--- a/src/pipelines/textgen.py
+++ b/src/pipelines/textgen.py
@ -0,0 +1,82 @@
 from transformers import pipeline
 from typing import Dict, List, Optional
 from src.config import Config
 class TextGenerator:
    """Text generator using transformers"""
    def __init__(self, model_name: Optional[str] = None):
        """
        Initialize the text-generation pipeline
        Args:
            model_name: Name of the model to use (optional)
        """
        self.model_name = model_name or Config.get_model("textgen")
        print(f"Loading text generation model: {self.model_name}")
        self.pipeline = pipeline("text-generation", model=self.model_name)
        print("Model loaded successfully!")
    def generate(self, prompt: str, max_length: int = 100, num_return_sequences: int = 1, 
                temperature: float = 1.0, do_sample: bool = True) -> Dict:
        """
        Generate text from a prompt
        Args:
            prompt: Input text prompt
            max_length: Maximum length of generated text
            num_return_sequences: Number of sequences to generate
            temperature: Sampling temperature (higher = more random)
            do_sample: Whether to use sampling
        Returns:
            Dictionary with generated texts
        """
        if not prompt.strip():
            return {"error": "Empty prompt"}
        try:
            results = self.pipeline(
                prompt,
                max_length=max_length,
                num_return_sequences=num_return_sequences,
                temperature=temperature,
                do_sample=do_sample,
                pad_token_id=self.pipeline.tokenizer.eos_token_id
            )
            generations = [
                {
                    "text": result["generated_text"],
                    "continuation": result["generated_text"][len(prompt):].strip()
                }
                for result in results
            ]
            return {
                "prompt": prompt,
                "parameters": {
                    "max_length": max_length,
                    "num_sequences": num_return_sequences,
                    "temperature": temperature,
                    "do_sample": do_sample
                },
                "generations": generations
            }
        except Exception as e:
            return {"error": f"Generation error: {str(e)}"}
    def generate_batch(self, prompts: List[str], **kwargs) -> List[Dict]:
        """
        Generate text for multiple prompts
        Args:
            prompts: List of input prompts
            **kwargs: Generation parameters
        Returns:
            List of generation results
        """
        return [self.generate(prompt, **kwargs) for prompt in prompts]