The Science of Human Data Collection.

What is Human Data Collection? 

As the world around us becomes more digitally focused, the way we interact with machines, data and each other is becoming more prominently through touchscreens, gestures, facial recognition, voice commands, and beyond. This revolution is driven by artificial intelligence (AI) and machine learning (ML) algorithms that rely on accurate ground truth data to produce effective recognition of the real world. 

ML and AI programs need high-quality data at scale to realistically improve efficiency. Computer vision is closing the gap between real-world images and perceived images captured by cameras. It is equally important to have collection initiatives for verbal speech, as well as human gestures and inanimate objects and spaces. 

Human Data Collection helps capture the granular behavioral data (physical, geographical, cultural and more) that can be used to tune the user experience for specific markets, cultures, and demanding applications. 

Examples of Human-driven applications: 


 • Emotion Recognition. This technology allows software to “read” the emotions on a human face using advanced image processing or audio data processing. We are now at the point where we can capture “micro-expressions,” or subtle body language cues, and vocal intonation that portray a person’s feelings. Users may include law enforcers, who want to detect more information about someone during an interrogation. It also has a wide range of applications for marketers. 

• Image Recognition. Image recognition is the process of identifying and detecting an object or feature in a digital image or video, and AI is increasingly being stacked on top of this technology to great effect. AI can search social media platforms for photos and compare them to a wide range of data sets to decide which ones are most relevant during image searches. Image recognition technology can also be used to detect license plates, diagnose diseases, analyze clients and their opinions and verify users based on their faces. 

• Biometrics. This technology can identify, measure and analyze human behavior and physical aspects of the body’s structure and form to allow for more natural interactions between humans and machines. 

Why is Human Data Collection important? 

Body language can be more powerful than the spoken word. How people react to and behave with products can reveal richer information than from that which is spoken. The spectrum of behavior, e.g. gestures or facial expressions or eye movement, can express the user experience with tremendous depth and more importantly, contradict what an individual may be saying to reveal a fuller truth. Properly recognizing human behaviors such as gestures and facial expressions is especially critical in understanding someone with limited verbal communication skills, such as young children or people with speech disabilities. 

The challenges of human data collection: 

• How much to do and how much is enough: It’s important to determine how much data is enough data to ensure that the algorithms work correctly. Human data, with different body language, hand gestures, and facial expressions, can be confusing and difficult to learn for computers. Determining the right amount of data to collect is difficult. 

• Human data capture is difficult: The most critical step is to fully understand what kind of human data is needed and all the associated parameters, before executing. Once you have decided how many participants are needed for an application, it requires special planning and research to determine how and where to find those people that exhibit the varying hand gestures, body language, facial expressions, and more. Failing to work through this process in a methodical way will only lead to extra cycles, time, and money to get the right data. 

• There is no standard: A common misconception is that there is, or should be, a standard for data capture. However, this manner of thinking does not consider that each project is unique, based on the product and the particular scenarios needed for optimizing it. You might standardize the execution aspect, but not when planning and designing the data capture process. 

The Q Analysts Approach: 

At Q Analysts, we are world leaders in the world of Data Collection and are the only firm providing a complete end to end solution from strategy to capture to ingestion to annotation and tagging. We have refined our approach through years of experience in this space to deliver superior results. Elements of our best practices include:  

• Partnership: When beginning a new project, it’s important to make a full evaluation at the onset, and that means asking the right questions. Identifying the right number of participants for a project is a first step. Obtaining information related to body language, hand gestures and facial expressions as well as other human data patterns will result in a thorough understanding of our clients’ requirements. At that point, we can guide them in determining what the optimal parameters will be for capturing the highest-quality, best-fit human data and ensure that the budget is scoped appropriately. 

• Experience: Q Analysts has significant expertise in initiating, creating, and delivering a variety of data collection initiatives in the areas of verbal speech, human gestures, inanimate objects and spaces. The knowledge base we have gained from designing and executing on hundreds of scenarios creates efficiencies that benefit our clients, saving them money in the long run. This knowledge, in combination with our best practices for creating a successful human data collection program, offers assurance that we are delivering the highest-quality data that requires no additional verification. 

• Design and Refine: Q Analysts is well-equipped to design the right program from the ground up, customized to each client – one size does not fit all. This experience also allows us to think creatively and refine the process. Before launching any project, we conduct a pilot plan first to ensure that the project is on track to achieve the best results. 

• Logistics and Execution: The logistics of human data collection can be very complex, requiring a knowledge of demographics, for example, and where to find them. Once the required demographic is identified, how do you reach those individuals and incent them to participate? This is just the tip of the logistics iceberg. Our many years of execution experience, along with a network of partners who provide a variety of resources, allows us to do the heavy lifting on our clients’ behalf. As AI systems get more sophisticated and cater to a growing worldwide audience the demographic needs are getting more and more granular. 

Data Privacy & Security: This is one the biggest areas of risk for firms consuming data from human participants. Several recent news articles have exposed practices that have lawmakers around the world taking note of potential privacy and security risks associated with human data collection. Mitigating risk and exposure in this area takes experience and understanding of best practices for handling sensitive data. Q Analysts approach to new projects accommodates the specific needs of that project, while meeting strict security policies outlined below.  

○ Project devices and equipment that contain sensitive and confidential data must always be secured. It’s also important that capture and storage devices must always be securely stored away when not in use.  

○ Participants involved in any data collection program or project should be required to sign an NDA prior to participating. They should be aware that the data they are providing is going to be used in some AI application but also be assured that it will be handled securely and de-identified at some stage, if not at the collection stage itself.  

○ When capture and storage devices are in use they must always be supervised and monitored during daily operation.   

○ Participants are not allowed to view the device back end systems to insure complete data privacy. Guidelines restricting access to data collected should also be in place.   

○ Data should not be shared with any parties who do not have permission to access the data and should be accessed only when necessary to perform job functions.  

○ Any moving or manual transfer of data (i.e. physically moving drives to other locations, handing off forms, etc.) must first be logged in the appropriate tracking form and managed by authorized personnel. Any transferring of data over the internet should be performed only on a secured network and overseen by authorized personnel.  

○ Data security policies must be held to a strict standard and should always be followed. 
If data security processes are breached, it could result in employee termination or even a complete project shut down. 

  • Q Analysts representative human data projects include: 
    • National project to capture 40,000 unique individuals’ performing scripted gestures and speech patterns.
  • • Project to capture extensive hand gestures movements for 1,000 unique individuals for a future technology product.
  • • National project to collect demographic based head and facial gestures for 12,000 humans.

Conclusion: 


As we become more digitally focused, our interaction with machines is increasing. The use of touchscreens, gestures, facial recognition, voice commands, and more are examples of AI and ML algorithms at work, illustrating this paradigm shift. Ground truth data is the foundation on which these products are developed. When you partner with Q Analysts, we help you understand every behavior variable—physical, geographic, cultural, and more to collect human data that accurately reflects the actual user experience. 

Q Analysts offers a unique end-to-end suite of Ground Truth Data Services for AI and ML product development from strategy to capture annotation & tagging all done in-house. Learn more about Q Analysts Ground Truth Data Services. The difference between capturing data and capturing high-quality data can make or break product performance in the real world. Designing and executing a high-quality data collection program will generate high-quality results. 

Comments are closed.