Voice Biometrics Glossary Of Terms

Glossary of Voice Biometric Terms

What is a biometric?

It is part of Biometrics refers to the measurement of a physical feature or repeatable action of an individual. There are many forms of biometric measurements. Examples are fingerprints, retinal/iris scans, facial recognition, hand geometry, signatures, DNA, and of course voiceprints. Voice biometrics is the measurement of how a person speaks.

What is Voice Biometrics?

Voice biometrics is the measurement of how a person speaks. Voice biometric systems use the measurement of spoken voice to distinguish between people, usually to validate or invalidate a claimed identity or determine identity from a group of people.
What is Voice Biometrics?

What is Classification in a voice biometric system?

Classification is the process for categorizing a person by evaluating voice samples from that person. Examples include male or female, young or old, or emotional state. Classification systems can be used within Identification systems to help reduce the number of comparisons necessary and to increase confidence in scoring. For example, if the person’s gender is clearly male, then it is not necessary to compare against voiceprints that are known to be female.

What is Enrollment in a voice biometric system?

In the context of voice biometrics, enrollment is the capture of voice samples from an individual with the intent of creating a voiceprint from the unique characteristics of the person’s speech patterns. Enrollment may be either Active or Passive. Active Enrollment means the person is knowingly engaged in the enrollment process and is speaking what the biometric system asked the person to speak. Passive Enrollment means the person’s voice samples are captured and delivered to the voice biometric system in the general course of other activity, such as when speaking with a call center agent. Without some notification, the person will not be aware that his/her voice is being captured. Passive Enrollment generally requires longer duration of voice samples to fully represent the person’s voice characteristics, such as in the SpeakFreely™ method.

What is Equal Error Rate (EER) in a voice biometric system?

Equal Error Rate of any security tool is the operating point where the percentage of false acceptances is equal to the percentage of false rejections. The lower the EER value, the better, as it is desireable to be both very good at recognizing valid system users as well as very good at screening out imposters and fraudsters. This is the most common term used to judge the accuracy of biometric and other security systems. It can be misleading because it is often a laboratory measurement based on a finite set of samples. For example, changing the data used for an EER test by ignoring known bad samples may significantly alter EER.

What is False Acceptance Rate (FAR)in a voice biometric system?

In biometric and other security systems, the false acceptance rate (or FAR) is the percentage of times when the system will incorrectly let an imposter or fraudster in as a valid user. This scenario is sometimes also referred to as a 'Type II' error. Giving unauthorized users access to any system can have profound implications, so it is very important to tune biometric systems to low FAR levels.

What is False Rejection Rate (FRR) in a voice biometric system?

In biometric and other security systems, the false rejection rate (or FRR) is the percentage of times when the system will incorrectly reject a valid user. This scenario is sometimes also referred to as a 'Type I' error. Rejecting a valid user is an inconvenience and this can have implications for long-term user acceptance. To help manage these types of errors, tuning is recommended, along with offering retries.

What is Feature Extraction in a voice biometric system?

When audio samples are submitted to a voice biometric engine during enrollment, verification, or identification, the engine analyzes the voice and extracts unique vocal characteristics using sophisticated audio signal processing techniques. The process that performs these tasks is commonly referred to as 'feature extraction'.

What is Identification in a voice biometric system?

Identification is the process of determining the identity of a person by evaluating voice samples from that person and comparing against a database of voiceprints from two or more people. In an Identification application, audio is captured and passed to the voice biometric system. The voice biometric system extracts information from the voice samples that it then compares to a set of previously enrolled voiceprints. The best match from the set of voiceprints is returned as the identity of the person. If the identity is known for certain to be in the set of voiceprints, then this is a Closed Set Identification. The best match is most likely the identity of the person.

However, in many applications, such as a blacklist fraud detection application, the person may not have a voiceprint enrolled in the database because 90% of the time the voice sample is not from a fraudster. When identity is not certain, this is called an Open Set Identification. In Open Set Identification, the best match is simply the best possible match of tested voiceprints. The application must then also look at the absolute values of the matching process to determine if the match is strong enough to warrant further research into whether this was in fact that person. Identification, whether Closed Set or Open Set, requires comparisons against multiple voiceprints. This is a 1:N or 1-to-many computation. Because the computations can grow to very large sizes depending on how many voiceprints are compared, it is necessary to pay closer attention to the system design to achieve a particular response time back from the voice biometric system.

What is Interactive Voice Response (IVR) in the context of voice biometrics?

Interactive Voice Response refers to technology in which a person is able to interact with a computer application through a voice telephone call. The person either uses the telephone keypad or spoken voice to provide input. The computer application issues commands that cause recorded or synthesized computer voice to play back to the person. The use of voice biometric authentication in conjunction with IVR systems is a rapidly growing practice. IVR systems provide a convenient means to capture voice samples. Furthermore, IVR systems are often the entry point for contacting a business over the phone, making the IVR system the ideal point to automate the authentication process.

What is Liveness Testing?

One of the potential weak points of biometric systems is susceptibility to a previously recorded occurence of the biometric as the input to a new verification transaction. For example, for facial biometrics putting a picture of someone, a fraudster may attempt to use a picture of the correct person. Therefore, many biometric systems perform "liveness testing" in order to test whether the biometric in question is in fact a live biometric rather than a recorded one. For voice, companies use different techniques, such as listening for exactly the same audio as what was submitted in an earlier transaction or identifying noise from recording devices. For applications that are at risk to recorded playback attacks, VBG recommends using RandomPIN™, which inherently includes liveness testing in the process, rather than relying on separate liveness testing.

What is Multi-Factor Authentication (MFA)?

An authentication factor refers to a piece of information and/or a technique used to authenticate a person’s identity. Factors are something you have, such as a picture ID or token, often in the form of a key fob; something you know, such as password, PIN, or answer to a shared secret; or something you are, such as a biometric in the form of a fingerprint, voiceprint, etc. Multi-Factor Authentication, or MFA, is a system where multiple factors are obtained during a single session to authenticate an individual, resulting in greater confidence in the authentication.

What is RandomPIN™ in the context of voice biometrics?

RandomPIN™, more generically referred to as Random Number, is a voice biometric verification method provided by Voice Biometrics Group where a person enrolls by speaking a set of specific sequences of numbers, then later verifies by speaking a randomized set of four or five digits. Because the person in the verification process will need to speak a different sequence of digits each time, RandomPIN™ is harder to break compared to a fixed passphrase.

What is SpeakFreely™ in the context of voice biometrics?

SpeakFreely™ is a voice biometric verification method provided by Voice Biometrics Group where a voiceprint is created from voice samples where the content of the spoken voice does not matter. In order to capture enough phonetic diversity to fully characterize a person’s voice, the voice samples must be of sufficient duration, typically at least 30 seconds of spoken voice, not including silence that occurs in speech. Verification against the voiceprint created using SpeakFreely™ is then possible with a voice sample only a few seconds in duration.

What is Automatic Speech Recognition (ASR)?

Automatic Speech Recognition, also known as voice recognition, is a technology where spoken words are recognized by the system for specific content. ASR is used in IVR systems, smartphones, cars, computers, and televisions for commands or dictation. ASR is useful in the context of voice biometrics for recognizing and validating the content of spoken voice in text-dependent interactions.

What is a Playback Attack in the context of voice biometrics?

Voice biometric systems rely on analyzing voice samples. A Playback Attack is when an impostor attempts to use a recording of a person's voice to verify claimed identity in a voice biometric system.

What is Speaker Identification?

Speaker Identification is the process of determining the identity of a person from a group of people using spoken voice.

What is Speaker Verification?

Speaker Verification is the process for confirming the identity of a person using spoken voice. It is synonymous with Verification in the context of voice biometrics.

What is Text-Dependent voice biometrics?

A text-dependent voice biometric interaction means that the content of what the person is speaking is essential to the biometric analysis. Imposing a text-dependent constraint, such as a fixed passphrase, is useful to achieve the shortest time necessary for enrollment. It can also be useful to determine if a live person is speaking rather than a recording, such as in Voice Biometrics Group’s RandomPIN™ method, where both the content and the voice must match. A strong voice biometric platform should support both Text-Independent and Text-Dependent interactions.

What is Text-Independent voice biometrics?

A text-independent voice biometric interaction means that the content of what the person speaks is not relevant to the biometric analysis. That is, the voice biometric system is able to analyze the voice regardless of what the person is saying. Enrolling a person’s voice with no constraints on the content means that more speech is necessary to capture the range of characteristics of the person’s voice to create a voiceprint. Therefore, most biometric companies will say that a minimum of 30 seconds of speech is necessary to create a voiceprint. As a rule of thumb, two to three minutes of a typical two-party, balanced conversation will result in 30 seconds of usable speech from either party. A strong voice biometric system should support Text-Independent interactions as well as Text-Dependent interactions. Voice Biometric Group’s SpeakFreely™ method is a pure Text-Independent form of biometric.

What is Verification in the context of voice biometrics?

Verification in the context of voice, also known as voice authentication or speaker verification, refers to the process of verifying a person’s identity by evaluating samples of a person’s voice and comparing to a voiceprint known to be from that person. In a voice verification application, users make an initial claim of identity, often by entering a user id and password. The application then prompts users for a speech sample and sends it to a voice authentication system. The voice authentication system extracts unique vocal features from the sample, compares them to the stored voiceprint for that user, and then returns the result to the application. The application can then decide whether to "pass" the user and let them continue or "fail" them and perform some alternate process. Verification is a 1:1 process of collecting a voice sample and comparing against a single voiceprint.

What is a Voiceprint?

A voiceprint is a mathematical representation of the unique physiological and behavioral features of a person's voice stored in electronic format. A voice biometric system creates a voiceprint from samples of a person’s voice. A voiceprint is not a recording or file that can be played back or otherwise listened to. Rather, a voiceprint is derived from audio analysis and statistical modeling of vocal features then stored in a proprietary format for each voice biometric company. Voiceprints cannot be reverse-engineered back into original speech, so this gives voiceprints very high security with respect to information storage concerns.

What is a Voiceprint API?

An API is an Application Programmer’s Interface specifically designed to enable a software developer to use the features of product. A Voiceprint API provides the software developer the means to create and delete identities in a voice biometric system, create and delete voiceprints for the identities, and perform verifications against those identities. Depending on the features of the voice biometric system, the Voiceprint API may also allow the software developer to perform Identification, Classification, and other requests. The software developer must have the means to submit audio files to the Voiceprint API.

What is VoiceXML?

VoiceXML is a Extensible Markup Language that is was designed for developers to write interactive voice applications using the same type of programming model as web applications. It is an industry standard and popular standard within call centers and IVR systems. It does not require specific hardware to run, nor does it require proprietary extensions for any of the major telephone systems providers. Many client applications leverage the simplicity and power of VoiceXML within their IVR systems to gather speech samples from their users and send them to the VBG system.