NaviSense App Revolutionizes Real-Time Object Recognition for Visually Impaired
November 24, 2025
NaviSense, a Penn State smartphone app, uses artificial intelligence and input from visually impaired users to locate objects in real time through spoken prompts and the phone’s audio and vibration cues.
The system interprets voice prompts, scans surroundings, and identifies targets with audio and haptic cues, offering follow-up questions when clarification is needed and tracking the user’s hand to provide directional guidance.
A standout feature is hand guidance, where the app tracks the user’s hand movements via the phone to guide them toward objects, addressing a gap identified in surveys.
The current iteration is effective but aims to improve power efficiency and further optimize LLM/VLM performance before commercialization, with hopes for a closer-to-market release.
Current work focuses on optimizing power usage and further improving LLM/VLM efficiency ahead of commercialization, with the technology described as close to market release.
The current version is effective but ongoing work aims to optimize power consumption and further improve LLM/VLM efficiency, with plans toward commercialization and broader accessibility.
The technology represents a shift from static, pre-programmed assistive tools to dynamic, conversational AI that can adapt to a wide range of environments and objects, aiming to increase independence and reduce reliance on human assistance.
Wider implications include advancing embodied AI through conjoined LLMs and VLMs, expanding accessibility and real-world utility of AI, while also raising considerations around accuracy, reliability, privacy, data security, battery life, and computational demands.
NaviSense is positioned within a broader shift toward embodied AI, highlighting the maturation of vision-language models for practical, real-world accessibility solutions while raising considerations about accuracy, computational demands, battery life, data privacy, and security.
The research emphasizes addressing privacy concerns and avoiding reliance on in-person support by using AI-driven, user-guided assistance, aiming to enhance autonomy for visually impaired users.
The development could influence a broad ecosystem, including AI labs, major tech companies, smartphone hardware manufacturers, and startups focused on assistive technologies, potentially accelerating investments and new product roadmaps in real-time perception and haptic guidance.
The development could influence a wide range of stakeholders, including AI labs, hardware manufacturers, and assistive-tech startups, potentially prompting competition and collaboration to integrate real-time perception and haptic guidance into devices and software.
The project team includes Ajay Narayanan Sridhar, Vijaykrishnan Narayanan, and other Penn State researchers, with collaborators from USC and external researchers, and is NSF-funded.
Key contributors include Vijaykrishnan Narayanan (team lead), Ajay Narayanan Sridhar (lead student investigator), Mehrdad Mahdavi, Fuli Qiao, Nelson Daniel Troncoso Aldas, Laurent Itti, and Yanpei Shi; the project is NSF-supported and affiliated with Penn State.
The tool uses large-language models and vision-language models hosted on an external server, enabling real-time recognition without preloading object models and addressing flexibility and efficiency limitations of prior systems.
The app leverages large-language models (LLMs) and vision-language models (VLMs) hosted on an external server to identify objects without preloading object models, enabling flexible recognition in diverse environments.
The tool connects to external servers hosting large-language models and vision-language models, allowing real-time object recognition without preloading object models, addressing inefficiencies and privacy concerns of previous systems.
The project emphasizes flexibility and real-time recognition without reliance on static object libraries, aiming to improve accessibility and user independence in everyday navigation.
Future directions envision integrating NaviSense-like capabilities into wearables (smart glasses, belts, footwear) for more seamless perception, improving recognition speed and accuracy, and possibly extending to tasks like reading text and identifying people, with Johns Hopkins’ similar headband-based system cited as related clinical work.
Future directions include improving object recognition speed and accuracy, expanding to wearables like smart glasses or belts, and exploring applications such as contextual environmental information, text reading in real time, and assistance with fine motor tasks; challenges include miniaturization, latency reduction, and affordability.
NaviSense received the Best Audience Choice Poster Award at the conference, signaling positive reception from the AI research community.
The technology has received positive reception, including the Best Audience Choice Poster Award at SIGACCESS ASSETS '25, signaling strong interest and validation from the research community.
NaviSense was unveiled at the ACM SIGACCESS ASSETS ’25 conference in Denver, where it won the Best Audience Choice Poster Award, highlighting strong reception from attendees.
The article frames NaviSense as a milestone in human-AI collaboration for accessibility, signaling a paradigm shift toward empowering independent navigation and environmental understanding for visually impaired users, with ongoing commercialization efforts and potential partnerships and trials anticipated.
The article frames NaviSense as a paradigm shift toward independent navigation for visually impaired individuals, leveraging LLMs and VLMs to move beyond static aids to dynamic, conversational assistance, with ongoing commercialization efforts and clinical trials anticipated in related systems.
The app was designed based on extensive interviews with visually impaired participants to address real-world challenges and tailor its features to user needs.
The development was guided by interviews with visually impaired individuals to tailor features to user needs, including a conversational query capability that asks follow-up questions to refine searches.
In a controlled test with 12 participants, NaviSense outperformed two commercial options by reducing search time and increasing object-identification accuracy, while delivering a better user experience.
User-centered development included interviews with visually impaired individuals and testing with 12 participants, comparing NaviSense to two commercial tools and measuring time-to-identification and object-detection accuracy.
Results showed NaviSense reduced search time and increased object-detection accuracy versus commercial options, with participants reporting a more intuitive experience and useful spatial cues (left/right/up/down and precise targeting).
The team is refining power efficiency and model performance as they move toward commercial readiness, with support from the U.S. National Science Foundation.
The system uses large-language models and vision-language models to understand natural language prompts and identify objects in the environment in real time, providing follow-up clarification when needed.
The system uses large-language models and vision-language models to understand natural language prompts and identify objects in the environment in real time, enabling conversational interactions such as asking where a specific item is and receiving guidance.
A key feature distinguishing NaviSense is hand guidance to objects, which was repeatedly requested by users and is not available in off-the-shelf solutions.
Summary based on 6 sources
Get a daily email with more AI stories
Sources

Penn State University • Nov 24, 2025
AI tool helps visually impaired users ‘feel’ where objects are in real time | Penn State University
Interesting Engineering • Nov 24, 2025
New AI app helps visually impaired users find everyday objects with greater speed
Tech Xplore • Nov 24, 2025
AI tool helps visually impaired users 'feel' where objects are in real time
FinancialContent • Nov 24, 2025
AI Breakthrough Empowers Visually Impaired with Real-Time Object Perception, Ushering in a New Era of Independence