Skip to content

Commit

Permalink
update websites
Browse files Browse the repository at this point in the history
  • Loading branch information
boyugou committed Oct 7, 2024
1 parent cffc279 commit 83a7372
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ <h2 class="subtitle is-3 publication-subtitle">
<p>
<b>UGround</b> is a universal <b>visual grounding</b> model for locating the element of an action by pixel coordinates on the screen. It is trained on 10M elements from 1.3M screenshots, and substantially outperforms previous SOTA GUI visual grounding models.
<br>
We propose a generic framework, <b>SeeAct-V</b>, that perceives the GUIs entirely visually, and takes pixel-level operations on screens. SeeAct-V Agents with UGround achieve SOTA prompt-only performance on six benchmarks, spanning <b>GUI Grounding</b> (web, mobile, desktop), <b>offline agent evaluation</b> (web, mobile, desktop), and <b>online agent evaluation</b> (web, mobile):
Different from prevalent approaches that rely on HTML/A11y trees for observation or grounding, we propose a generic framework, <b>SeeAct-V</b>, that <b>perceives the GUIs entirely visually</b>, and <b>takes pixel-level operations</b> on screens. SeeAct-V Agents with UGround achieve SOTA performance on six benchmarks, spanning <b>GUI Grounding</b> (web, mobile, desktop), <b>offline agent evaluation</b> (web, mobile, desktop), and <b>online agent evaluation</b> (web, mobile):
<!-- <ul>-->
<!--&lt;!&ndash; <li>🌐 📱 💻 ScreenSpot (GUI Grounding)</li>&ndash;&gt;-->
<!--&lt;!&ndash; <li>🌐 Multimodal-Mind2Web </li>&ndash;&gt;-->
Expand Down

0 comments on commit 83a7372

Please sign in to comment.