About Skills Projects Research Stats Contact
AI Engineer & Computer Vision Researcher

Amir
Soltani

Building the future of
Deep Learning · Medical AI · Real-Time Vision Systems · Edge Deployment

1.8K
GitHub Followers
84
Repositories
5+
Years Experience
20+
AI Projects
scroll
Amir Soltani
Available for Projects
🏆 LPIC Certified · Linux

Passion Meets
Precision

I'm Amir Soltani an AI engineer with deep roots in computer vision, medical imaging, and real-time deep learning systems. With over 1.8K GitHub followers and 84+ repositories, I've built production-grade AI across healthcare, automotive, satellite imagery, and edge devices.

I specialize in pushing state-of-the-art beyond benchmarks: whether that's achieving 42mm MPJPE on Human3.6M with Transformer architectures, deploying tumor segmentation to hospitals with 0.92 DSC score, or running YOLO-class detection in under 30ms on embedded hardware.

My philosophy: Knowledge is not skill. Knowledge plus ten thousand hours is skill.

🏥
Medical AI
Deployed to 5 hospitals
Real-Time Vision
<30ms TensorRT latency
🛰️
Remote Sensing
Sentinel & Maxar datasets
🎓
Edge AI
Raspberry Pi · Android · TFLite
Technical Arsenal

Skills &
Expertise

A comprehensive stack from research to production deployment across cloud, edge, and embedded platforms.

🧠 Programming Languages
Python C++ MATLAB Kotlin Bash/Shell
🔥 Deep Learning Frameworks
PyTorch TensorFlow Keras HuggingFace MMDetection MONAI
👁️ Computer Vision
OpenCV YOLO Variants MediaPipe Albumentations Scikit-Image TorchGeo
Edge & Deployment
TensorRT TF Lite DeepStream ONNX Raspberry Pi Android/Kotlin
🛠️ MLOps & Tools
Docker Git/GitHub/GitLab Azure AWS Comet ML Neptune Roboflow LabelBox CVAT
🌍 Geo & Autonomous
GDAL Rasterio GeoPandas CARLA Simulator Openpilot SAM2
🤖 AI/LLM & Automation
LangChain N8N RAG Pipelines Windsurf Cursor Claude
🎯 Specializations
Object Detection Instance Segmentation Pose Estimation Few-Shot Learning Medical Imaging Remote Sensing VLMs
Portfolio

Featured
Projects

Production-grade AI systems spanning medical imaging, autonomous vehicles, satellite analysis, and real-time edge deployment.

● Medical AI
Automated Medical Image Segmentation
U-Net system for brain tumor segmentation on BraTS 2021. Active learning pipeline reducing annotation time by 35%. Deployed across 5 hospitals via MONAI framework.
0.92 DSC Score −30% Workload 5 Hospitals
U-NetMONAIActive LearningBraTS 2021
● Medical AI
Tooth Detection, Segmentation & Numbering
Intelligent Dental Imaging System for Endodontic Surgery Prediction. Cascade R-CNN on OPG X-rays with contour analysis for overlapping tooth disambiguation.
96% Accuracy −15 min/patient +27% Numbering
Cascade R-CNNDental AIOPG X-raysCAD Integration
● Medical AI
Medical Image Enhancement & Super-Resolution
MRI denoising with SNR improvement from 12 dB to 18 dB on the IXI dataset. CT contrast enhancement via CLAHE for liver lesion detection. Integrated into PACS.
12→18 dB SNR −22% Prep Time PACS Ready
Few-ShotCLAHE3D SegmentationIXI Dataset
● Medical AI
Retinal Vessel Segmentation & Lesion Grading
Transformer-UNet hybrid for automated diabetic retinopathy grading on DRIVE & STARE datasets. Multi-scale feature fusion for accurate vessel tree extraction and microaneurysm detection.
0.94 AUC 97.2% Sensitivity
ViT-UNetDRIVE DatasetRetina AI
● Computer Vision
3D Human Body Pose Estimation
Transformer-based pose prediction with robust occlusion handling. Real-time deployment via TensorRT with under 30ms latency on Human3.6M benchmark.
42mm MPJPE <30ms Latency −4% vs Baseline
TransformerTensorRTHuman3.6MReal-Time
● Computer Vision
ViUNet: Vision Transformer UNet for Segmentation
Novel architecture merging U-Net's localization strengths with ViT's global context modeling. Multi-scale attention mechanisms for robust feature representation across diverse imaging modalities.
Multi-Scale Attn Global Context
ViTU-NetAttentionCustom Architecture
● Computer Vision
Multi-Modal Detection: Pose, Face & Hand
MediaPipe/OpenCV unified system for simultaneous pose, facial landmark, and hand gesture tracking. Low-light face detection optimized via histogram equalization. Deployed in AR applications.
Real-Time 3-in-1 Pipeline
MediaPipeOpenCVARLandmark Detection
● Computer Vision
Gym Workout Recognition via ST-GCN
Spatiotemporal Graph Convolutional Network for real-time gym exercise classification. ONNX deployment enabling efficient inference on fitness applications with 88% accuracy on UCF101.
88% Accuracy UCF101 ONNX Deployed
ST-GCNONNXAction Recognition
● Computer Vision
Image Restoration: Real-ESRGAN & GFPGAN
Historical image restoration pipeline using Real-ESRGAN. Fine-tuned GFPGAN for high-fidelity facial reconstruction. HAT super-resolution model deployed in a production mobile application.
4× Super-Res Mobile App
Real-ESRGANGFPGANHATMobile Deploy
● Computer Vision
Few-Shot Fashion Recommendation System
Hybrid few-shot learning approach for visual fashion item retrieval and recommendation. Siamese networks combined with transformer embeddings for outfit compatibility scoring.
Few-Shot Siamese Network
Few-ShotSiamese NetTransformers
● Detection
Thermal Object Detection: YOLO–Transformer Hybrid
Real-time colormap-invariant thermal detection framework. Hybrid YOLO–Transformer architecture achieving robust detection across varied thermal colormap configurations in industrial environments.
Colormap-Invariant Real-Time
Thermal ImagingYOLOTransformerIndustrial AI
● Detection
License Plate Recognition with YOLOv7
End-to-end plate detection and OCR recognition pipeline using YOLOv7. Optimized for varying lighting, angles, and international plate formats with high-speed inference.
YOLOv7 Multi-Format
YOLOv7OCRLPRReal-Time
● Detection
Multi-Object Tracking with DeepSORT & ByteTrack
Robust multi-object tracking pipeline combining YOLOv8 detection with DeepSORT and ByteTrack. Real-time speed estimation and trajectory analysis for surveillance and traffic monitoring.
MOTA 78.4% 30+ FPS
DeepSORTByteTrackYOLOv8Tracking
● Detection
Industrial Defect Detection System
Automated quality control pipeline for manufacturing defect detection. Anomaly detection via PatchCore and supervised YOLOv8 for surface scratch, dent, and contamination classification.
99.1% Precision Zero False-Pass
PatchCoreAnomaly DetectionManufacturing
● Satellite / Remote Sensing
Scalable High-Res Object Detection via Sentinel
Rare-object detection and matching framework for large Sentinel satellite tiles. Sliding-window + attention-guided detection with geospatial coordinate output and GeoJSON export.
Large Tile GeoJSON Output
SentinelGDALRasterioTorchGeo
● Satellite / Remote Sensing
Road Segmentation via Maxar Open Data & SAM2
Zero-shot and few-shot road network extraction from high-resolution Maxar imagery using Segment Anything Model 2. Automated pipeline for urban planning and infrastructure mapping.
SAM2 Zero-Shot
SAM2MaxarGeoPandasRoad Network
● Satellite / Remote Sensing
Fire & Smoke Detection — Industrial & Wildfire
Dual-domain detection system for fire and smoke in both industrial environments and wildfire satellite imagery. Temporal analysis for early-warning systems with drone integration capability.
Early Warning Multi-Domain
Fire DetectionTemporal CVDrone Integration
● Satellite / Remote Sensing
Change Detection in Multitemporal Satellite Imagery
Siamese network architecture for detecting land-use and structural changes between multitemporal satellite image pairs. Deployed for flood damage assessment and urban expansion monitoring.
Siamese Net Flood Assessment
Change DetectionSiameseMultitemporal
● Edge AI
YOLOv11-EdgeSuite: Universal Mobile Vision Toolkit
Android mobile vision toolkit built in Kotlin using YOLOv11 for detection and segmentation on smartphones. Real-time inference via TensorFlow Lite with camera API integration.
Android Real-Time TF Lite
YOLOv11KotlinTF LiteAndroid
● Edge AI
Edge/Corner Detection on Raspberry Pi 4
GPU-accelerated Harris Corner and Sobel edge detection for real-time video processing on Raspberry Pi 4. Adaptive thresholding for robust low-light performance.
GPU Accelerated Real-Time Video
Harris CornerSobelRaspberry Pi 4OpenCV
● Edge AI
Autonomous Driving Perception — CARLA + Openpilot
Full AV perception stack integrating object detection, lane segmentation, and depth estimation within the CARLA simulator. Openpilot-compatible policy evaluation for urban and highway scenarios.
CARLA Sim Full Stack
CARLAOpenpilotDepth Est.Lane Seg
● Edge AI
Sign Language Recognition — Real-Time ASL
MediaPipe Holistic + LSTM temporal classifier for real-time American Sign Language recognition. Deployed on embedded hardware with under 20ms latency and 94% word-level accuracy.
94% Word Acc. <20ms Latency
MediaPipeLSTMASLEdge Deploy
Publications & Research

Research
Contributions

Contributing to state-of-the-art in computer vision, medical AI, and deep learning architectures.

01
ViUNet: Vision Transformer-Based UNet for Image Segmentation
Novel architecture · Multi-scale Attention · Cross-domain Generalization
Research
02
Few-Shot Cascade R-CNN for Medical Object Detection & Segmentation
Medical Imaging · Few-Shot Learning · Cascade Architecture
Applied
03
Real-Time Colormap-Invariant Thermal Detection via Hybrid YOLO–Transformer
Industrial AI · Thermal Imaging · Domain Adaptation
Research
04
Rare Object Detection & Matching in Large Satellite Tiles
Remote Sensing · Geospatial AI · Sentinel Imagery
In Progress
05
2D–3D Super-Resolution–Driven Detection of Medical Imaging Targets
Super-Resolution · 3D Medical Reconstruction · Diagnostic AI
Applied
06
Adaptive Few-Shot Density HeatMap Generation for Crowd Analysis
Crowd Counting · Density Estimation · Few-Shot Adaptation
In Progress
GitHub Activity

By The
Numbers

76
Stars Earned
👥
1.8K
Followers
📦
84
Repositories
🏆
6
Achievements
🔀
20+
Forks

GitHub Stats

GitHub Stats

Trophies

Trophies

Top Languages

Top Languages

Let's Build
Something
Extraordinary

Open to collaborations, research partnerships, consulting engagements, and challenging AI projects. Whether it's medical imaging, real-time vision, or autonomous systems — let's connect.

Send Email →