<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jack Wang</title>
    <description>The latest articles on Forem by Jack Wang (@jack_wang_d47b1f7f781c64f).</description>
    <link>https://forem.com/jack_wang_d47b1f7f781c64f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3660680%2F2a1509c8-1b17-4258-a895-aea792f5cfda.png</url>
      <title>Forem: Jack Wang</title>
      <link>https://forem.com/jack_wang_d47b1f7f781c64f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jack_wang_d47b1f7f781c64f"/>
    <language>en</language>
    <item>
      <title>Just added SAM3 video object tracking to X-AnyLabeling!</title>
      <dc:creator>Jack Wang</dc:creator>
      <pubDate>Sat, 03 Jan 2026 17:10:03 +0000</pubDate>
      <link>https://forem.com/jack_wang_d47b1f7f781c64f/just-added-sam3-video-object-tracking-to-x-anylabeling-33md</link>
      <guid>https://forem.com/jack_wang_d47b1f7f781c64f/just-added-sam3-video-object-tracking-to-x-anylabeling-33md</guid>
      <description>&lt;p&gt;Hey everyone!&lt;/p&gt;

&lt;p&gt;Just wanted to share that we've integrated SAM3's video object tracking into X-AnyLabeling. If you're doing video annotation work, this might save you some time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track objects across video frames automatically&lt;/li&gt;
&lt;li&gt;Works with text prompts (just type "person", "car", etc.) or visual prompts (click a few points)&lt;/li&gt;
&lt;li&gt;Non-overwrite mode so it won't mess with your existing annotations&lt;/li&gt;
&lt;li&gt;You can start tracking from any frame in the video&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to the original SAM3 implementation, we've made some optimizations for more stable memory usage and faster inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cool part:&lt;/strong&gt; Unlike SAM2, SAM3 can segment all instances of an open-vocabulary concept. So if you type "bicycle", it'll find and track every bike in the video, not just one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;br&gt;
For text prompting, you just enter the object name and hit send. For visual prompting, you click a few points (positive/negative) to mark what you want to track, then it propagates forward through the video.&lt;/p&gt;

&lt;p&gt;We've also got Label Manager and Group ID Manager tools if you need to batch edit track_ids or labels afterward.&lt;/p&gt;

&lt;p&gt;It's part of the latest release (v3.3.4). You'll need X-AnyLabeling-Server v0.0.4+ running. Model weights are available on ModelScope (for users in China) or you can grab them from GitHub releases.&lt;/p&gt;

&lt;p&gt;Setup guide: &lt;a href="https://github.com/CVHub520/X-AnyLabeling/blob/main/examples/interactive_video_object_segmentation/sam3/README.md" rel="noopener noreferrer"&gt;https://github.com/CVHub520/X-AnyLabeling/blob/main/examples/interactive_video_object_segmentation/sam3/README.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anyone else working on video annotation? Would love to hear what workflows you're using or if you've tried SAM3 for this kind of thing.&lt;/p&gt;

</description>
      <category>segmentanything3</category>
      <category>xanylabeling</category>
      <category>computervision</category>
      <category>ai</category>
    </item>
    <item>
      <title>Meet X-AnyLabeling: The Python-native, AI-powered Annotation Tool for Modern CV 🚀</title>
      <dc:creator>Jack Wang</dc:creator>
      <pubDate>Sun, 14 Dec 2025 02:32:39 +0000</pubDate>
      <link>https://forem.com/jack_wang_d47b1f7f781c64f/meet-x-anylabeling-the-python-native-ai-powered-annotation-tool-for-modern-cv-507b</link>
      <guid>https://forem.com/jack_wang_d47b1f7f781c64f/meet-x-anylabeling-the-python-native-ai-powered-annotation-tool-for-modern-cv-507b</guid>
      <description>&lt;h2&gt;
  
  
  The "Data Nightmare" 😱
&lt;/h2&gt;

&lt;p&gt;Let’s be honest for a second.&lt;/p&gt;

&lt;p&gt;As AI engineers, we love tweaking hyperparameters, designing architectures, and watching loss curves go down. But there is one part of the job that universally sucks: &lt;strong&gt;Data Labeling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s the unglamorous bottleneck of every project. If you've ever spent a weekend manually drawing 2,000 bounding boxes on a dataset, you know the pain.&lt;/p&gt;

&lt;p&gt;I realized the tooling landscape was broken:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Commercial SaaS:&lt;/strong&gt; Great features, but expensive and I hate uploading sensitive data to the cloud.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Old-school OSS (LabelImg/Labelme):&lt;/strong&gt; Simple, but "dumb." No AI assistance means 100% manual labor.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Heavy Web Suites (CVAT):&lt;/strong&gt; Powerful, but requires a complex Docker deployment just to label a folder of images.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted something different. I wanted a tool that felt like a lightweight desktop app but had the brain of a modern AI model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqti6jwvv9iwjnx9waiyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqti6jwvv9iwjnx9waiyb.png" alt="X-AnyLabeling’s Vision" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, I built &lt;strong&gt;X-AnyLabeling&lt;/strong&gt;. And today, we are releasing &lt;strong&gt;Version 3.0&lt;/strong&gt;. 🎉&lt;/p&gt;

&lt;h2&gt;
  
  
  What is X-AnyLabeling? 🤖
&lt;/h2&gt;

&lt;p&gt;X-AnyLabeling is a desktop-based data annotation tool built with Python and Qt. But unlike traditional tools, it’s designed to be &lt;strong&gt;"AI-First."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The philosophy is simple: &lt;strong&gt;Never label from scratch if a model can do a draft for you.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Whether you are doing Object Detection, Segmentation, Pose Estimation, or even Multimodal VQA, X-AnyLabeling lets you run a model (like YOLO, SAM, or Qwen-VL) to pre-label the data. You just verify and correct.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26b9geenj8er1ls26x73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26b9geenj8er1ls26x73.png" alt="X-AnyLabeling Ecosystem" width="720" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is what’s new in v3.0 and why it matters for developers.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Finally, a PyPI Package 📦
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flunki1tnyg3b3gcqxlra.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flunki1tnyg3b3gcqxlra.png" alt="X-AnyLabeling Pypi" width="720" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the past, you had to clone the repo and pray the dependencies didn't break. We fixed that. You can now install the whole suite with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install with GPU support (CUDA 12.x)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;x-anylabeling-cvhub[cuda12]

&lt;span class="c"&gt;# Or just the CPU version&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;x-anylabeling-cvhub[cpu]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also added a &lt;strong&gt;CLI tool&lt;/strong&gt; for those who love the terminal. Need to convert a dataset from COCO to YOLO format? Don't write a script; just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;xanylabeling convert &lt;span class="nt"&gt;--task&lt;/span&gt; yolo2xlabel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. The "Remote Server" Architecture ☁️ -&amp;gt; 🖥️
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwq75uuna5v06c18mtk5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwq75uuna5v06c18mtk5.png" alt="X-AnyLabeling-Server" width="720" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a big one for teams. Running a heavy model (like SAM-3 or a large VLM) on a annotator's laptop is slow or impossible.&lt;/p&gt;

&lt;p&gt;We introduced &lt;strong&gt;X-AnyLabeling-Server&lt;/strong&gt;, a lightweight FastAPI backend.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Server:&lt;/strong&gt; You deploy the heavy models on a GPU machine.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Client:&lt;/strong&gt; The annotator uses the lightweight UI on their laptop.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Result:&lt;/strong&gt; Fast inference via REST API without local hardware constraints.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It supports custom models, Ollama, and Hugging Face Transformers out of the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The "Label-Train-Loop" with Ultralytics 🔄
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6f25evmgr8fkbnu9n0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6f25evmgr8fkbnu9n0p.png" alt="Auto Training in X-AnyLabeling" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We integrated the &lt;a href="https://github.com/ultralytics/ultralytics" rel="noopener noreferrer"&gt;Ultralytics&lt;/a&gt; framework directly into the GUI.&lt;/p&gt;

&lt;p&gt;You can now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Label a batch of images.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Click "Train" inside the app.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Wait for the YOLO model to finish training.&lt;/li&gt;
&lt;li&gt; Load that new model back into the app to auto-label the &lt;em&gt;next&lt;/em&gt; batch of images.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a positive feedback loop that drastically speeds up dataset creation.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multimodal &amp;amp; Chatbot Capabilities 💬
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg57e374vc5ts3bmah2z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg57e374vc5ts3bmah2z7.png" alt="Chatbot" width="720" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Computer Vision isn't just boxes anymore. We added features for the &lt;strong&gt;LLM/VLM era&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;VQA Mode:&lt;/strong&gt; Structured annotation for document parsing or visual Q&amp;amp;A.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Chatbot:&lt;/strong&gt; Connect to GPT-4, Gemini, or local models to "chat" with your images and auto-generate captions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Export:&lt;/strong&gt; One-click export to &lt;code&gt;ShareGPT&lt;/code&gt; format for fine-tuning LLaMA-Factory models.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Supported Models (The "Batteries Included" List) 🔋
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvml9gk9lopo7tlndz974.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvml9gk9lopo7tlndz974.png" alt="X-AnyLabeling's model zoo" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We support &lt;strong&gt;100+ models&lt;/strong&gt; out of the box. You don't need to write inference code; just select them from the dropdown.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Segmentation:&lt;/strong&gt; SAM 1/2/3, MobileSAM, EdgeSAM.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Detection:&lt;/strong&gt; YOLOv5/8/10/11, RT-DETR, Gold-YOLO.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OCR:&lt;/strong&gt; PP-OCRv5 (Great for multilingual text).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multimodal:&lt;/strong&gt; Qwen-VL, ChatGLM, GroundingDINO.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it out! 🛠️
&lt;/h2&gt;

&lt;p&gt;This project is 100% Open Source.&lt;/p&gt;

&lt;p&gt;We've hit &lt;strong&gt;7.5k stars&lt;/strong&gt; on GitHub, and we're just getting started. If you are tired of manual labeling or struggling with complex web-based annotation tools, give X-AnyLabeling a spin.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/CVHub520/X-AnyLabeling" rel="noopener noreferrer"&gt;https://github.com/CVHub520/X-AnyLabeling&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://github.com/CVHub520/X-AnyLabeling/tree/main/docs" rel="noopener noreferrer"&gt;Full Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d love to hear your feedback in the comments! What features are you missing in your current data pipeline? 👇&lt;/p&gt;

</description>
      <category>ai</category>
      <category>computervision</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
