Artificial Intelligence and Detectron
Detectron is a software solution that reveals a person’s core traits even before they engage in open dialogue. This article discusses the development of a software module for analyzing physiological parameters and audio channels to assess a person’s emotional state via real-time video streams using artificial intelligence models.
The project aims to solve communication challenges in relationships with both strangers and acquaintances, without the need for direct conversation. The program operates online, measuring and analyzing human behavior to help users adapt to anyone and achieve goals in sales, management, monitoring, marketing, healthcare, and other fields. Detectron works in real time, requiring only 20 to 90 seconds to diagnose and address any people-related task set by the client. In most cases, a standard smartphone with a built-in camera (equivalent to an iPhone 6 or newer) is sufficient; occasionally, additional audio or video equipment may be needed. The software analyzes the video stream in real time and can be deployed as a standalone solution or integrated into the client’s infrastructure.
How Detectron Is Useful
The product predicts a person’s reaction to stimuli and questions based on their appearance and prevailing emotional state. It analyzes:
- Speech patterns
- Dialogue scenarios
- Presented objects
- Stress levels
All these data points are used together, with one prediction confirming or adjusting another, creating an architecture capable of analyzing input data streams at about 60 frames per second.
For security and HR departments, a specialized software suite is offered to optimize and enhance the reliability of security and personnel management operations. The suite enables initial staff assessments for trustworthiness, motivation, and abilities, immediately filtering out unreliable candidates and highlighting those who require more attention or caution. It also tracks each employee’s dynamics, signaling deviations before they become organizational problems, and helps evaluate current staff for more effective team formation and talent pools, scaling these functions to any company size.
For sales departments, the suite assesses potential clients’ interests and values based on their psychological type, offering recommendations to immediately present the most relevant product, thus increasing client loyalty and interest. Managers can evaluate the potential effectiveness of communication based on the client’s mood and state, engaging only when results are likely.
Methods and Approaches for Achieving Desired Results
The current beta version recognizes emotions, psychological types, baseline behavior, stress levels, and some medical indicators. It builds 3–5 models predicting subject behavior using fast neural networks. A higher-order model confirms or neutralizes the conclusions of lower-order models. The initial dataset includes about 20,000 videos, with time intervals marked for characteristic reactions, absence of reactions, and atypical reactions (which the program ignores). The model is further enhanced by calculating physiological characteristics (pulse, breathing, blinking, facial expressions, gestures, voice timbre, speech pauses, etc.) and their rate of change.
The system operates as a web service, accepting video or frame sequences and returning a time-stamped array of numerical values in JSON format. An intermediate dispatcher distributes the load and presents results visually. Calculations are performed on servers equipped with Nvidia GTX 1080 GPUs.
The emotional state assessment subsystem is divided into two parts: one extracts behavioral components using neural networks and algorithms; the other evaluates emotional state based on these components, using a combination of approaches (notably from the International Academy for Lie Research) and other research and theoretical methods.
Supporting subsystems for structuring, storing, and analyzing results are developed as software services using the emotional assessment outputs. Technical tools and practices include:
- Facial emotion recognition using Active Appearance Model and FACS (Paul Ekman’s Facial Action Coding System)
- Pulse detection via PCA decomposition of video images
- Breathing rate detection using Eulerian Video Magnification
- Facial color change detection via PCA decomposition
- Gaze direction, body/limb/head position, clothing/accessory recognition, gait analysis (computer vision)
- Voice pattern recognition, amplitude-frequency analysis, MFCC (Mel-Frequency Cepstral Coefficients)
- Modern neural network design practices
Market Size, Industry Analysis, and Development Prospects
The emotion detection market is booming, projected to reach $19–37 billion by 2021. Emotion Detection and Recognition Systems (EDRS) and affective computing are forming their own ecosystem within AI development. Market estimates vary due to different metrics and calculation methods. According to MarketsandMarkets, the global emotion market was $6.72 billion in 2016 and is expected to reach $36.07 billion by 2021, with a 39.9% annual growth rate. Reportlinker and Orbis Research offer more conservative forecasts: $29.17 billion/27.4% and $19.96 billion/21.2% by 2022, respectively. Gartner predicts that by 2021–2022, smartphones will know us better than our friends and interact with us on a subtle emotional level.
The industry’s key regions are Asia-Pacific, North America (USA and Canada), and the European Union. The most promising channels remain facial microexpression recognition and biosensors in wearables, followed by voice/speech and eye-tracking. Emotional and behavioral technologies are in demand across sectors, including healthcare. For example, Israel’s Beyond Verbal and Mayo Clinic are searching for vocal biomarkers to not only detect emotions but also predict diseases like aortic coronary disease, Parkinson’s, and Alzheimer’s, linking emotional analysis to gerontology and aging research.
While B2B dominates (intelligent transport, retail, advertising, HR, IoT, gaming), there is also B2C demand: EaaS (Emotion as a Service) or cloud-based human data analytics lets users upload videos and receive emotional and behavioral statistics for each segment. In political debates, algorithms can detect even subtle emotional cues. Soon, emotion recognition will be standard in every smartphone, enabling smart interfaces that determine user state via a regular webcam. This is a promising niche, as emotion detection can be used commercially—from analyzing media content perception to criminal investigations.
Entertainment also benefits: new iPhones use Face ID for both security and creating emoji that mimic your expressions. Most new products in emotional science are based on seven basic emotions and facial microexpressions, which reflect our feelings beyond conscious control. Technologies also analyze speech, voice, and gaze, with applications in psychiatry and criminal justice for detailed emotional state assessment.
Companies and teams can now use open scientific data on emotion recognition, combining it with technology to advance affective computing. FAANG (Facebook, Apple, Amazon, Netflix, Google) and tech giants like IBM have made significant contributions. The digitalization of society, proliferation of devices, and the ubiquity of images and video (billions uploaded daily), along with social media, enable effective extraction and analysis of emotional data for consumer and user profiling—provided it’s done legally and ethically.
Health and Healthtech
The health industry is rapidly adopting advanced data collection and analysis methods, as machine algorithms identify symptoms using thousands of similar cases. Mobile apps already analyze psycho-emotional states from photos and text, improving as users interact more. Devices can now detect your mood and adjust music, lighting, or even make coffee accordingly. More advanced systems assess fatigue or detect deviations from the norm, including diseases like Alzheimer’s or Parkinson’s, which affect facial muscles, eye movement speed, and subtle changes in voice and micro-movements long before symptoms appear.
In advertising, global retail chains are integrating online and offline, seeking to understand and predict customer desires. As neurointerfaces achieve sensitive emotion recognition, mall ads will adapt to passersby’s moods in seconds. In 2017, a San Francisco research group trained an LSTM neural network to better recognize emotional content in text, enabling near-flawless mood detection in Amazon reviews and Rotten Tomatoes movie critiques, improving service and predicting product popularity.
Gaming Industry
The first Google Glass prototype aimed to revolutionize gesture control—reading text on the lens by moving your eyes. While the device remained a prototype, eye movement research shifted to gaming, opening new possibilities for interactive experiences.
Competitive Advantages and Global Comparisons
Loom.ai is pioneering virtual communication with animated, personalized 3D avatars, using Deep Learning and computer vision. Binary VR develops real-time facial detection technology, including facial recognition, landmark tracking, and expression recognition, generating 3D characters and VR avatars with AR filters. Affectiva, spun out of MIT Media Lab, is a leader in AI-based emotion recognition, understanding the crucial role of emotions in all aspects of life. Cyntient AI is a software platform that uses AI to simulate human behavior in video games and simulators, creating realistic, intuitive, and emotional virtual characters.
Target Consumer Segments and Demand Assessment
Target segments include sales departments, HR, security services, medical and educational institutions.
HR Departments
On average, hiring a new employee costs a company 1.5–2 times the employee’s monthly salary, including direct costs (job board fees, recruiter time, candidate evaluation, paperwork) and indirect costs (bonuses, extra vacation for overworked staff, workspace setup, severance, etc.). For example, hiring a mid-level developer with a 60,000 RUB salary costs about 100,000 RUB. If the wrong candidate is hired, costs can double due to failed projects and reputational damage. Adaptation takes 3–6 months, during which efficiency is 50–70%, and colleagues spend extra time on training and communication. If an employee leaves during probation, total losses can reach 400,000 RUB or more.
The Russian recruitment market began to grow in 2017 after a long stagnation, with agencies filling more vacancies and the overall market reaching 66.3 billion RUB. Demand for external recruiters increased in industries experiencing economic recovery, especially for specialists and entry-level positions.
Sales, HR, security, medical, and educational sectors all need and are interested in this project, as it reduces candidate evaluation costs and increases decision speed and accuracy. For sales managers, the inability to quickly connect with clients leads to lost business. For security, polygraph tests take hours, while the program detects deception in under 3 minutes and can monitor employees’ psycho-emotional state around the clock. In medicine, it can provide a preliminary diagnosis in 2 minutes from a patient’s video. In education, it enables continuous monitoring of audience engagement in both offline and online learning.
The product addresses market needs for:
- Increasing sales through individualized approaches without lengthy staff training
- Reducing HR staff by shortening interview times without sacrificing candidate assessment quality
- Obtaining objective client feedback without direct questions
- Detecting and preventing fraud schemes
- Automatic monitoring of employees’ predisposition to deviant behavior
Key application areas include:
- Business: Replacing lie detection specialists at meetings to assess potential partnerships
- Insurance: Detecting fraud in claims processing
- Banking: Creating online credit scoring systems
- Auditing: Enhancing traditional document checks with behavioral analysis
- Transportation: Improving safety and anti-terrorism measures
- Hospitality: Preventing crimes in hotels
- Recruiting: Helping recruiters detect deception and uncover hidden issues in candidates’ backgrounds
Technical Specifications
The video stream is processed online, with analysis results returned several times per second, including:
- Current emotional state and intensity (emotion/strength scale)
- Current emotional state on a positive-negative/arousal-inhibition scale
- Predicted reaction to stimuli based on emotional state and appearance/behavior
- Comparison of predictions to confirm or neutralize reactions
- Graphs of emotional state changes over time
- Graphs of pulse, blinking, breathing changes, and stress-related facial reactions
- Analysis of speech rate, pitch, and timbre
Output information includes:
- Emotional state and intensity (normalized matrix of factors)
- Stress level (pulse, breathing, blinking frequency, facial dynamics—numbers; anxiety—yes/no)
- Predicted reaction to stimuli (text)
Parameters:
- Continuous input stream processing at a minimum resolution of 1280 Ă— 720
- Processing speed of at least 40 frames per second
- Prediction accuracy of at least 90%
- Numerical data output at least 4 times per second
- Graph display interval of at least 30 seconds
Prototypes for accuracy improvement tools are ready, with working testing and training modules.
Product requirements:
- Camera resolution: 1280 Ă— 720 or higher
- Minimum system: cloud storage, computing power, and RAM (purchase or rental)
- Full access to all software functions
Non-Functional Requirements
Security requirements include three main categories: access control, private data handling, and risk mitigation from external attacks.
Intellectual property plans: A patent for the “Method for Measuring and Analyzing Human Behavior” is planned for 2020–2021.
By 2021, further development is planned for:
- A subsystem using video and audio data to assess emotional state
- A subsystem structuring emotional state via context (structured conversation, interview, arbitrary context modules)
- A subsystem for regular collection and storage of emotional state data
- A subsystem for analyzing collected data to detect anomalies
A software suite for medical services is also planned, aimed at optimizing and improving the reliability of healthcare institutions. The suite will:
- Detect deviations or their absence (for diseases the neural network is trained on)
- Maintain statistics, save states, analyze changes over time, and signal potentially critical changes (for trained diseases)
- Scale these functions to any region and number of patients
- Scale to any diseases or disorders suitable for additional diagnostics via facial/body movements, skin condition, and other external signs