This repo contain the server developed for the DigiTala in Action project.
- Develop a server to processing speech send by a mobile app, and return the 5 speech ratings scores:
fluency, pronunciation, range, accuracy, holistic - Our main goals are as follow:
- The server is secure and reliable: it should be able to run with minimum maintenance for the next 4-5 years. If the instance restart, server will also automatic start again and data is not lost.
- Simplicity for maintenance/setup: as the maintainer may not familiar with server development. We may need to migrate our server to a better/worse CSC instance in the future, so easier to setup is also important.
- For such reasons, we prefer Docker/Podman, but if it is possible to run without any container (simpler) then we are also ok with your solution.
- Please comments your code.
- We would want to have a script for setup the server (see Server_setup.md as an example). And another script to rebuild the container when we made change (see SaySvenska server).
- The processing time of 45s of audio is usually quite long with AI (we are looking at 10~20s here), so take that into account. (What will happen if there is 2 person speak at the same time on the mobile app, does it crash?).
- We also need the server to store some pseudonymized data from users: UUID (generated by mobile app), consent, timestamp, speech (depend on their consent), their scores... . You can use mongodb, but for simplicity saving it in a csv file (and audio files in folder) is preferred (for simplicty).
- Other requirements may come up during the project, but it must fit with your timeline (a complex feature request at the end of the project timeline is a no no).
For more information, you can look at SaySvenska server: https://github.com/Usin2705/SaySvenska/tree/main/Server
You can look at an example of the API (a bit old now) from SaySuomi Readme file: https://github.com/Usin2705/CaptainA_unity/tree/main
- audio data: from attached file, in .wav format
- guid: in text form, to identify users
- Other text (or number) data collected from user feedback.
from flask import Flask, jsonify, request
# You will use FastAPI, so the code will be slighly different:
def func_assess_speech():
# Function for assessing user's speaking skill
wav_file = request.files['file'] # We receive the attached audio file here
guid = request.form["guid"] # We get the GUID sent to us, to know who is this user
other_info = request.form["other_info"] # Other info needed/collected
fluency, pronunciation, range_score, accuracy, holistic = ai_model.process(wav_file) # This is just an example that use AI Model to process the audio file
# For example, you will get
fluency = 4.3
pronunciation = 2.4
range_score = 3.3
accuracy = 4.9
holistic = 4.0
return jsonify({
"fluency" : fluency,
"pronunciation" : pronunciation,
"range" : range_score,
"accuracy" : accuracy,
"holistic" : holistic,
}), 200