Google Docs Voice Typing is pretty accurate for a free online text-to-speech tool. Arguably the most impressive thing though is the an extensive list of commands for formatting text. You can simply say things like 'go to end of paragraph' or 'create bulleted list' to perform formatting actions. To the maximum extent permitted by applicable law, the aggregate liability of Mega Voice Command and its affiliates, officers, employees, agents, suppliers and licensors, relating to the services will be limited to an amount greater of one dollar or any amounts actually paid in cash by you to Mega Voice Command for the prior one month period. Speech synthesis is the artificial production of human speech.A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.
BLOGAre you ready to build your own voice assistant like Siri, Alexa, or like Jarvis? In this tutorial, you will learn how to code your own voice assistant using Python.
Are you interested in building your own virtual voice assistant like Jarvis in the movie Iron Man? If you are interested in building one, then you have come to the right place.
Howdy folks, In this tutorial, you will learn how to build your own personal voice assistant like Jarvis using Python.
You can download the finished project code from my Github repo - Final Version.
Now before getting started, let's understand what we are going to build..
Understanding What we are going to build?
Jarvis Voice Changer
The speech recognition program which we are going to build will be able to recognize these commands:
- name - tells its name.
- date - tells the date.
- time - tells the current time.
- how are you? - will say 'I am fine..'.
- search - will search using Google.
- and finally, if we say 'quit' or 'exit', it will terminate.
To achieve all these functionalities, we are going to use mainly 3 python modules:
- SpeechRecognition - to recognize our speech and to convert it into text format using Google's Web Speech API.
- PyAudio - for accessing and working with the Microphone.
- pyttsx3 - for converting given text to speech(ie for generating computer voice)
How we are going to build this?
It's basically very simple. We need to create only 3 functions and that's it!
- The first function, recognize_voice(), will be responsible for capturing our voice (which we input through the Microphone), recognizing it, and returning the 'text' version of it.
- Then we will take that 'text' version of our voice and give it to another function called reply(), which will be responsible for replying back to us and doing all sorts of other crazy things (like searching google, telling the current time, etc.).
- Finally, a function called speak(), which will take whatever text we give it and converts it into speech.
We will repeat the above functions infinitely until the user says 'quit' or 'exit'.
Requirements
- You should be good at python3.
- You should have python3.3 or a higher version installed on your computer.
- You should have venv installed. If you are using Python 3.3 or newer, then the venv is already included in the Python standard library and requires no additional installation.
- You should have a microphone (your laptop's builtin one or the one on your earphone will do the job)
- You should need an Internet connection.
- Finally, you should have a modern code editor like visual studio code.
With these things in place, let's get started.
Initial Setups
- First, create a folder named voice_assistant anywhere on your computer.
- Then open it inside visual studio code.
Now let's make a new virtual environment using venv and activate it. To do that:
[P] Paul Bettany (JARVIS) Voice Dataset? : MachineLearning
- Open Terminal > New Terminal.
- Then type:
This command will create a virtual environment named venv for us.
- To activate it, if you are on windows, type the following:
- If you are on Linux/Mac, then:
Jarvis Voice Command Software
Now you should see something like this:
Note: Virtual environments like venv help us to keep all the dependencies related to the current project in its own environment isolated from the main computer. That's one of the main reasons why we are using it.
- Finally, create a new file named 'main.py' directly inside the voice_assistant folder like below:
- Now you will have something similar to this:
That's it, now let's install those required modules.
Installing the requirements
For recognizing our voice and converting it into text, we need some additional module like SpeechRecognizer, so let's install it. Type the following command in the terminal:
Now If you are using the Microphone as the input source, in our case we are, then we need to install the PyAudio package.
The process for installing PyAudio will vary depending on your operating system.
For Linux:
If you are on Mac:
If you are on Windows:
If you got any errors installing PyAudio on Windows, then refer to this StackOverflow solution. If you are on different machines, then try to Google the error. If you still got those errors, then feel free to comment below.
Once you've got PyAudio installed, you can test the installation from the terminal by typing this:
Make sure your default microphone is on and unmuted. If the installation worked, you should see something like this:
Jarvis Voice Command For Pc
- The first function, recognize_voice(), will be responsible for capturing our voice (which we input through the Microphone), recognizing it, and returning the 'text' version of it.
- Then we will take that 'text' version of our voice and give it to another function called reply(), which will be responsible for replying back to us and doing all sorts of other crazy things (like searching google, telling the current time, etc.).
- Finally, a function called speak(), which will take whatever text we give it and converts it into speech.
We will repeat the above functions infinitely until the user says 'quit' or 'exit'.
Requirements
- You should be good at python3.
- You should have python3.3 or a higher version installed on your computer.
- You should have venv installed. If you are using Python 3.3 or newer, then the venv is already included in the Python standard library and requires no additional installation.
- You should have a microphone (your laptop's builtin one or the one on your earphone will do the job)
- You should need an Internet connection.
- Finally, you should have a modern code editor like visual studio code.
With these things in place, let's get started.
Initial Setups
- First, create a folder named voice_assistant anywhere on your computer.
- Then open it inside visual studio code.
Now let's make a new virtual environment using venv and activate it. To do that:
[P] Paul Bettany (JARVIS) Voice Dataset? : MachineLearning
- Open Terminal > New Terminal.
- Then type:
This command will create a virtual environment named venv for us.
- To activate it, if you are on windows, type the following:
- If you are on Linux/Mac, then:
Jarvis Voice Command Software
Now you should see something like this:
Note: Virtual environments like venv help us to keep all the dependencies related to the current project in its own environment isolated from the main computer. That's one of the main reasons why we are using it.
- Finally, create a new file named 'main.py' directly inside the voice_assistant folder like below:
- Now you will have something similar to this:
That's it, now let's install those required modules.
Installing the requirements
For recognizing our voice and converting it into text, we need some additional module like SpeechRecognizer, so let's install it. Type the following command in the terminal:
Now If you are using the Microphone as the input source, in our case we are, then we need to install the PyAudio package.
The process for installing PyAudio will vary depending on your operating system.
For Linux:
If you are on Mac:
If you are on Windows:
If you got any errors installing PyAudio on Windows, then refer to this StackOverflow solution. If you are on different machines, then try to Google the error. If you still got those errors, then feel free to comment below.
Once you've got PyAudio installed, you can test the installation from the terminal by typing this:
Make sure your default microphone is on and unmuted. If the installation worked, you should see something like this:
Jarvis Voice Command For Pc
If you are using Ubuntu, then you may get some errors of the form 'ALSA lib [..] Unknown PCM' like this:
To suppress those errors, seethis Stackoverflow answer.
Now to give the program the ability to talk, we have to install the pyttsx3 module:
pyttsx3 is a Text to Speech (TTS) library for Python 2 and 3. It works without an internet connection or delay. It also supports multiple TTS engines, including Sapi5, nsss, and espeak.
That's it, we have installed and set up all the pre-requirements. Now it's time to write the program itself, so let's do that.
recognize_voice()
First of all, let's import all the necessary imports.
Type the following code inside the main.py file:
- First, we are importing the speech_recognition module as sr.
- Then we are importing the sleep() function from the time module. We will use this in a bit to make a fake delay.
- Then for knowing the current date and time, we need that datetime module.
- Then to open up a browser and do a google search, we need the help of the webbrowser module.
- Then as I said earlier, to convert text to speech, we need pyttsx3.
All of the magic in SpeechRecognition happens with the Recognizer class. So let's instantiate it next:
Now configure the pyttsx3:
- pyttsx3 will be responsible for generating the computer voice. To see/hack the gender, age, speed, etc. of the generated computer voice, read this description.
Now let's create that recognize_voice() function. This recognize_voice() function will do the following:
- listens to our Microphone.
- recognize our voice with the help of recognize_google() function.
- converts it into text format.
- And then returns that text version of our voice.
Create the recognize_voice() function like below:
- If some error happens like if your Internet connection is bad, then it will just speak() the appropriate message.
Remember that the speak() function is not a builtin function. We have to create it and we will do it at the end because it is a small function.
And also remember that this speak() function will convert the given text to speech(the computer-generated voice).
Cached
Now at the very bottom of the file, type the following:
- After making a delay of 1 second, we start an infinite loop.
- Then speak() the message 'Start speaking..', which will be like a prompt for the end-user.
- Then we listen for the voice and convert it into text format using the recognize_voice() function which we just created.
- Now we have the text_version of our inputted speech. So we can use this to generate responses like telling the date, current time, searching the google like that according to what we asked for.
- That's what the reply() function is going to do.
Now let's create that reply() function.
reply()
This function will accept text_version as an argument and then act accordingly. Type the following code below the recognize_voice() function which we created earlier:
How To Create A JARVIS Like AI Assistant (with Pictures ..
- See it's very simple. All we are doing is just checking if 'any_piece_of_text' is present in the given text_version. If we found any of those certain texts which we are looking for, then we will act accordingly like speak() -ing the current time, or date, searching the Google by opening the webbrowser like that.
- Again see, we are using the speak() function, but haven't created it yet. And that's what we are going to do next.
speak()
Type the following code above/below reply() function:
- Pretty straight forward isn't it? Here we are using the engine, we earlier instantiated, to say() the text we give. And that's the only thing we are doing inside the speak() function.
That's it you have successfully created your own python voice assistant in a matter of time!
Now let's test it. Type the following code inside the terminal window at the bottom:
Go on, ask a few questions like 'What is your name?', 'What is the date today?', 'Search Google' like that.
Have fun with it..
Final Code
Here is the final version of the main.py file. If you got any error, then cross-check your code with the following one:
Wrapping Up
See Full List On Github.com
I hope you enjoyed this tutorial. In some places, I intentionally skipped the explanation part. Because those codes were simple and self-explanatory. That's why I left it to you to decode it on your own.
True learning takes place when you try things on your own. By simply following a tutorial won't make you a better programmer. You have to use your own brain.
If you still have any error, first try to decode it on your own by googling it.
If you didn't find any solutions, then only comment on them below. Because you should know how to find and resolve a bug on your own and that's a skill that every programmer should have!
And that's it, Thank you ;)