Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google master #73

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# ViSQOL Modified

This is a ViSQOL modified version that returns Audio Delay and its Sample Adjustments (In Seconds). To get a better understanding, ViSQOL algorithm aligns the two samples (reference and degraded) so they can be compared. First, it does a global alignment (Considered as the Audio Delay) which uses cross-correlation to find the best match, and then it does `Voice Activity Detection` to later do `Patch Alignment` (Considered as Audio Delay Sample Adjustment). For further explanation of the ViSQOL tool check [this arxiv paper](https://arxiv.org/pdf/2004.09584.pdf). The sample adjustement or patch alginment is important to understand how the audio behaves, not just the quality, but if there are speed up/slow down regions, this will translate into higher Sample Adjustments, and so the audio "experience" will be worse if these numbers are too high (This depends on the lenght of your audio file and the aims of the application to test - 7 second -> >~ 0.05)

The original ViSQOL code is in [here](https://github.com/google/visqol)

# ViSQOL

ViSQOL (Virtual Speech Quality Objective Listener) is an objective, full-reference metric for perceived audio quality. It uses a spectro-temporal measure of similarity between a reference and a test speech signal to produce a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score. MOS-LQO scores range from 1 (the worst) to 5 (the best).
Expand Down
9 changes: 9 additions & 0 deletions src/alignment.cc
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <iostream>
using namespace std;

#include "alignment.h"

Expand Down Expand Up @@ -91,6 +93,13 @@ std::tuple<AudioSignal, double> Alignment::GloballyAlign(
}
AudioSignal new_degraded_signal{std::move(new_degraded_matrix),
degraded_signal.sample_rate};
static int myStaticVar;
if (myStaticVar == 0) {
myStaticVar = 1;
cout << "Audio Delay: " << (-1)*float(best_lag)/float(ref_signal.sample_rate) << " Seconds at " << ref_signal.sample_rate << "Hz\n";
} else {
cout << "Audio Delay Sample Adjustment: " << (-1)*float(best_lag)/float(ref_signal.sample_rate) << " Seconds\n";
}
return std::make_tuple(
new_degraded_signal,
best_lag / static_cast<double>(degraded_signal.sample_rate));
Expand Down
Binary file added testdata/.DS_Store
Binary file not shown.