Generative AI Tools Development
1. Image Generation:
This Image Generation module is a FastAPI-based plugin that integrates advanced generative models such as Stable Diffusion, ControlNet, and SDXL. It supports both text-to-image and image-to-image workflows, with configurable parameters including prompt guidance scale, seed control, resolution, and inference steps. The system dynamically loads models, applies LoRA weights and textual inversion for fine-tuning, and utilizes GPU acceleration when available. Built for seamless use within interfaces like ComfyUI, it enables flexible, high-quality image creation with support for ControlNet-based conditioning, live configuration updates, and multiple scheduler options like DPM and PNDM.
2. Object Video Semantic and Instance Segmentation:
This module performs high-precision semantic and instance segmentation on video frames using a combination of Grounding DINO for object detection and SAM (Segment Anything Model) for mask generation. Built with FastAPI and PyTorch, the plugin enables prompt-based object identification and segmentation with customizable thresholds, dynamic model loading, and real-time execution. The system supports generating binary masks from RGB images, storing outputs efficiently, and running on both CPU and GPU devices. It integrates smoothly with interfaces like ComfyUI and is ideal for video analysis tasks in AI research, robotics, and media processing.
3. Video Super Resolution:
This Video Super Resolution plugin enhances the quality and resolution of both images and video sequences using advanced deep learning models such as ESRGAN, SwinIR, and BasicVSR. Built with FastAPI and PyTorch, the system allows for flexible model selection and interval-based frame processing to reduce memory overhead. It supports single-image enhancement as well as batch super-resolution of video frames, producing high-fidelity outputs even on low-resolution or compressed media. The plugin intelligently handles large inputs by adapting processing intervals, and leverages GPU acceleration when available, making it ideal for restoring visual quality in AI-generated content, archival footage, or media upscaling pipelines.