This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
Contributions:1 PR, 22 pushes, 14 comments in 2 months
object-detectionopen-worldopen-world-detectionvision-language
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Contributions:6 pushes, 19 comments in 4 months
large-multimodal-modelsmultimodal-large-language-models