Build Simple CLI-Based Voice Assistant with PyAudio, Speech Recognition, pyttsx3 and SerpApi
Intro
As you saw by the title, this is a demo project that shows a very basic voice-assistant script that can answer your questions in the terminal based on Google Search results.
You can find the full code in the GitHub repository: dimitryzub/serpapi-demo-projects/speech-recognition/cli-based/
The follow-up blog post(s) will be about:
- Web-based solution using Flask, some HTML, CSS and Javascript.
- Android & Windows based solution using Flutter and Dart.
What we will build in this blog post
💡Click on the image to open demo video.
Prerequisites
First, let's make sure we are in a different environment and properly install the libraries we need for the project. The hardest (possibly) will be to install pyaudio
.
Virtual Environment and Libraries Installation
Before we start installing libraries, we need create and activate new environment for this project:
# if you're on Linux based systems
$ python -m venv env && source env/bin/activate
$ (env) <path>
# if you're on Windows and using Bash terminal
$ python -m venv env && source env/Scripts/activate
$ (env) <path>
# if you're on Windows and using CMD
python -m venv env && .\env\Scripts\activate
$ (env) <path>
Explanation | |
python -m venv env | tells Python to run module (-m ) venv and create a folder called env . |
&& | Stands for AND. |
source <venv_name>/bin/activate | will activate your environment and you'll be able to install libraries only in that environment. |
Now install all needed libraries:
pip install rich pyttsx3 SpeechRecognition google-search-results
Now to pyaudio
. Please, keep in mind that pyaudio
may throw an error while installing. An additional research may be needed on your end.
If you're on Linux, we need to install some development dependencies to use pyaudio
:
$ sudo apt-get install -y libasound-dev portaudio19-dev
$ pip install pyaudio
If you're on Windows, it's simpler (tested with CMD and Git Bash):
pip install pyaudio
Full Code
import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv
load_dotenv('.env')
console = Console()
def main():
console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
recognizer = speech_recognition.Recognizer()
while True:
with console.status(status='Listening you...', spinner='point') as progress_bar:
try:
with speech_recognition.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.1)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio_data=audio).lower()
console.print(f'[bold]Recognized text[/bold]: {text}')
progress_bar.update(status='Looking for answers...', spinner='line')
params = {
'api_key': os.getenv('API_KEY'),
'device': 'desktop',
'engine': 'google',
'q': text,
'google_domain': 'google.com',
'gl': 'us',
'hl': 'en'
}
search = GoogleSearch(params)
results = search.get_dict()
try:
if 'answer_box' in results:
try:
primary_answer = results['answer_box']['answer']
except:
primary_answer = results['answer_box']['result']
console.print(f'[bold]The answer is[/bold]: {primary_answer}')
elif 'knowledge_graph' in results:
secondary_answer = results['knowledge_graph']['description']
console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
else:
tertiary_answer = results['answer_box']['list']
console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
progress_bar.stop() # if answered is success -> stop progress bar.
user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')
if user_promnt_to_contiune_if_answer_is_success == 'y':
recognizer = speech_recognition.Recognizer()
continue # run speech recognizion again until `user_promt` == 'n'
else:
console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
break
except KeyError:
progress_bar.stop()
error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")
if error_user_promt == 'y':
recognizer = speech_recognition.Recognizer()
continue # run speech recognizion again until `user_promt` == 'n'
else:
console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
break
except speech_recognition.UnknownValueError:
progress_bar.stop()
user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')
if user_promt_to_continue == 'y':
recognizer = speech_recognition.Recognizer()
continue # run speech recognizion again until `user_promt` == 'n'
else:
progress_bar.stop()
console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
break
if __name__ == '__main__':
main()
Code Explanation
Import libraries:
import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv
Library | Purpose |
rich | Python library for beautiful formatting in the terminal. |
pyttsx3 | Python's Text-to-speech converter that works in offline. |
SpeechRecognition | Python library to convert speech to text. |
google-search-results | SerpApi's Python API wrapper that parses data from 15+ search engines. |
os | To read secret environment variable. In this case it's SerpApi API key. |
dotenv | To load your environment variable(s) (SerpApi API key) from .env file. .env file could renamed to any file: .napoleon . (dot) represents a environment variable file. |
Define rich
Console()
. It will be used to prettify terminal output (animations, etc):
console = Console()
Define main
function where all will be happening:
def main():
console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
recognizer = speech_recognition.Recognizer()
At the beginning of the function we're defining speech_recognition.Recognizer()
and console.rule
will create the following output:
───────────────────────────────────── SerpApi Voice Assistant Demo Project ─────────────────────────────────────
The next step is to create a while loop that will be constantly listening for microphone input to recognize the speech:
while True:
with console.status(status='Listening you...', spinner='point') as progress_bar:
try:
with speech_recognition.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.1)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio_data=audio).lower()
console.print(f'[bold]Recognized text[/bold]: {text}')
Code | Explanation |
console.status | A rich progress bar, it's used only for cosmetic purpose. |
speech_recognition.Microphone() | To start picking input from the microphone. |
recognizer.adjust_for_ambient_noise | Intended to calibrate the energy threshold with the ambient energy level. |
recognizer.listen | To listen for actual user text. |
recognizer.recognize_google | Performs speech recognition using Google Speech Recongition API. lower() is to lower recognized text. |
console.print | A rich print statement that allows to use text modification, such as adding bold, italic and so on. |
spinner='point'
will produce the following output (use python -m rich.spinner
to see list of spinners
):
After that, we need to initialize SerpApi search parameters for the search:
progress_bar.update(status='Looking for answers...', spinner='line')
params = {
'api_key': os.getenv('API_KEY'), # serpapi api key
'device': 'desktop', # device used for
'engine': 'google', # serpapi parsing engine: https://serpapi.com/status
'q': text, # search query
'google_domain': 'google.com', # google domain: https://serpapi.com/google-domains
'gl': 'us', # country of the search: https://serpapi.com/google-countries
'hl': 'en' # language of the search: https://serpapi.com/google-languages
# other parameters such as locations: https://serpapi.com/locations-api
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
progress_bar.update
will, well, update progress_bar
with a new status
(text printed in the console), and spinner='line'
will produce the following animation:
After that, the data extraction happens from Google search using SerpApi's Google Search Engine API.
The following part of the code will do the following:
try:
if 'answer_box' in results:
try:
primary_answer = results['answer_box']['answer']
except:
primary_answer = results['answer_box']['result']
console.print(f'[bold]The answer is[/bold]: {primary_answer}')
elif 'knowledge_graph' in results:
secondary_answer = results['knowledge_graph']['description']
console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
else:
tertiary_answer = results['answer_box']['list']
console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
progress_bar.stop() # if answered is success -> stop progress bar
user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')
if user_promnt_to_contiune_if_answer_is_success == 'y':
recognizer = speech_recognition.Recognizer()
continue # run speech recognizion again until `user_promt` == 'n'
else:
console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
break
except KeyError:
progress_bar.stop() # if didn't found the answer -> stop progress bar
error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")
if error_user_promt == 'y':
recognizer = speech_recognition.Recognizer()
continue # run speech recognizion again until `user_promt` == 'n'
else:
console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
break
The final step is to handle error when no sound was picked up from the microphone:
# while True:
# with console.status(status='Listening you...', spinner='point') as progress_bar:
# try:
# speech recognition code
# data extraction code
except speech_recognition.UnknownValueError:
progress_bar.stop() # if didn't heard the speech -> stop progress bar
user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')
if user_promt_to_continue == 'y':
recognizer = speech_recognition.Recognizer()
continue # run speech recognizion again until `user_promt` == 'n'
else:
progress_bar.stop() # if want to quit -> stop progress bar
console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
break
console.rule()
will provide the following output:
───────────────────── Thank you for cheking SerpApi Voice Assistant Demo Project ──────────────────────
Add if __name__ == '__main__'
idiom which protects users from accidentally invoking the some script(s) when they didn't intend to, and call the main
function which will run the whole script:
if __name__ == '__main__':
main()
Links
Add a Feature Request💫 or a Bug🐞