alexa
Reach Us +44-7482864460
Encoding and Assessing Sung Melodies in Stroke Patients with Aphasia
ISSN: 2329-6895

Journal of Neurological Disorders
Open Access

OMICS International organises 3000+ Global Conferenceseries Events every year across USA, Europe & Asia with support from 1000 more scientific Societies and Publishes 700+ Open Access Journals which contains over 50000 eminent personalities, reputed scientists as editorial board members.

Open Access Journals gaining more Readers and Citations
700 Journals and 15,000,000 Readers Each Journal is getting 25,000+ Readers

This Readership is 10 times more when compared to other Subscription Journals (Source: Google Analytics)
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.
  • Research Article   
  • J Neurol Disord, Vol 7(1)
  • DOI: 10.4172/2329-6895.1000404

Encoding and Assessing Sung Melodies in Stroke Patients with Aphasia

Anthony Androulakis*
Center for the Study of Aphasia Recovery (C-STAR), University of South Carolina, Columbia, South Carolina, USA
*Corresponding Author: Anthony Androulakis, Center for the Study of Aphasia Recovery (C-STAR), University of South Carolina, Columbia, South Carolina, USA, Tel: 803-777-7700, Email: [email protected]

Received Date: Feb 19, 2019 / Accepted Date: Mar 15, 2019 / Published Date: Mar 22, 2019

Abstract

Aphasia is a language and communication disorder caused by damage to the brain, usually occurring after stroke or traumatic brain injury. Two MATLAB computer programs are presented for encoding and assessing the sung melodies of stroke patients. The first MATLAB program, Sung Melody to Matrix (SMM.m), converts a patient’s sung melody into a matrix containing the frequency and corresponding duration of each note sung. To find when the patient moves from one note to another, a novel method called Visual Audio Signal Envelope (VASE) is used, which determines an audio signal’s envelope through visual cues. Other existing envelopes that were tested did not work as well with the voice of post-stroke patients recorded in a noisy environment. The second MATLAB program, Melodic Fidelity Evaluator (MFE.m), compares this matrix to the matrix of the tune that the patient was trying to imitate. This second program provides a fair assessment of the note interval error and the time interval error of the patient. In addition, these programs are easy-to-use and can be automated for large data sets to correlate with brain lesions in stroke patients.

Keywords: Aphasia; Stroke; MATLAB; Melody; Signal envelope; Sung Melody to Matrix (SMM.m); Visual Audio Signal Envelope (VASE); Melodic Fidelity Evaluator (MFE.m); Euclidean distance

Introduction

Aphasia is a language and communication disorder that can take away one’s ability to speak. Ongoing research focuses on the relationship between singing abilities of post-stroke patients and damage in their brain. Moreover, singing has been used as a treatment for aphasic patients as in Melodic Intonation Therapy [1]. To evaluate the abilities of post-stroke patients to repeat melodies, simple melodies were first provided to the patients and they were asked to repeat them. The patients were recorded in a casual (not noiseless) environment as they attempted to repeat the melodies. The assessment of the patients’ sung melodies can be a demanding task if many patient recordings need to be analyzed, and therefore a computer program would be more efficient and consistent. However, currently, there is no existing computer program to fairly assess a sung melody of a post- stroke individual in a noisy real-world environment. The fair assessment of the singing voice, that is, finding and assessing the notes that a person most likely sung, has been a subjective matter. This assessment becomes even more subjective with post-stroke patients since many have difficulties speaking and singing. Therefore, the automatic and objective assessment of sung melodies of post-stroke patients is very important. The purpose of this project is to provide this automatic and objective assessment, which can be used to correlate brain imaging findings to melodic repetition impairment.

This project consists of two codes written in MATLAB: Sung Melody to Matrix (SMM.m) and Melodic Fidelity Evaluator (MFE.m). These programs are available in GitHub [2]. The first code, SMM.m, encodes a sung melody into a matrix. The task of encoding a clear singing voice of non-aphasic individual in an ideal noiseless environment has been addressed by many automatic pitch detection algorithms [3]. This task is similar to monophonic singing transcription based on hysteresis of pitch-time curve, intonation, auditory models, and probabilistic methods [4-9]. Although there are speech recognition programs and even machine-learning algorithms to recognize the words of aphasic patients [10,11], these programs do not apply to melodies sung by post-stroke patients. The program presented here is based on a new envelope that I constructed using visual cues. The second code, MFE.m, gives a fair (and consistent) assessment of the attempt of a post- stroke patient to imitate a certain original tune. Although there exists a program that can evaluate the difference between two audio inputs [12], the presented program gives a fair and informative evaluation of the singing ability of the stroke patients. This program is informative since it gives separate error evaluations on note intervals, time durations, and number of notes added or subtracted. Other singing assessment programs which are mainly designed for singers who seek to improve their singing abilities can be found in [13-15], which use dynamic time warp (DTW) and use machine learning based on pitch interval accuracy and vibrato. Our approach uses simple rules that I find to be fair based on abilities of aphasic patients [16,17].

Implementation and Architecture

The outline of how the two programs SMM.m and MFE.m work together along with example inputs and outputs is shown in Figure 1.

neurological-disorders-audio-file

Figure 1: The input of the SMM.m is an audio file and the output is a matrix containing the frequencies and corresponding durations for each note sung. The inputs of MFE.m are the output matrix of the SMM.m as well as a matrix of the frequencies and corresponding durations of the notes of the tune the patient was attempting to repeat. The outputs of MFE.m are duration error, note interval error and number of notes added or deleted.

The implementation of the two programs to evaluate the repetition ability of a post- stroke patient is as follows:

1) A simple melody is played to the post-stroke patient. Examples of such melodies are the following (Figure 2):

neurological-disorders-WAV-files

Figure 2: The WAV files for these tunes can be found in GitHub [2], and they are named tune1.wav, tune2.wav and tune3.wav, respectively.

The WAV files for these tunes can be found in GitHub [2], and they are named tune1.wav, tune2.wav and tune3.wav, respectively.

2) While being recorded, the patient attempts to repeat the played tune.

3) Open a 2018 version of MATLAB. You will need to have these two toolboxes:

• Image Processing Toolbox (version 10.2 was used)

• Audio System Toolbox (version 1.4 was used)

4) Clear your workspace variables (using the command clear) and close any figures that may be open (using the command close).

5) Find out the path of the recording of the patient, which will be put into a variable called filename as a string. An example of this in Mac computers would be:

filename = ' / Users / Androulakis / Recordings / patient tune1.wav';

And likewise, in Windows would be:

filename = 'C: \Users \ Androulakis \ Recordings \ patient tune1.wav';

Note that the apostrophe used (‘) is straight and not curved as in (’). Else, MATLAB will give you an error. Of course, your path does not have to be identical to the one shown above, so this is just an example. This is how your recording will be Note that the apostrophe used (‘) is straight and not curved as in (’). Else, MATLAB will give you an error. Of course, your path does not have to be identical to the one shown above, so this is just an example. This is how your recording will be inputted into SMM.m.

6) Run SMM.m. The output will be in a variable called PHz (PHz stands for Patient Hertz) containing the frequencies and corresponding durations for each note sung.

7) Manually create the matrix of the frequencies and durations of the melody the patient attempted to repeat. For finding the frequencies of the notes, Wikipedia’s Piano Key Frequencies chart [18] was used. For finding the durations of the notes, use the formula applicable to the tempo of your tune (Table 1).

Which note gets the beat Formula
Half Note 120/(beats per minute)
Quarter Note 60/(beats per minute)
Eight Note 30/(beats per minute)
Sixteenth Note 15/(beats per minute)

Table 1: For finding the durations of the notes, use the formula applicable to the tempo of your tune.

For example, the matrices of the tunes given above in Step 1 would be inputted as follows (OHz Stands for Original Hertz):

Matrix of Tune 1: OHz=[440 493.8833 523.2511 391.9954 440 391.9954 329.6276;30/95 30/95 30/95 30/95 60/95 60/95 120/95];

Matrix of Tune 2: OHz=[261.6256 293.6648 329.6276 349.2282 329.6276 391.9954;60/120 30/120 30/120 90/120 30/120 60/120];

Matrix of Tune 3: OHz=[391.9954 349.2282 349.2282 523.2511 493.8833;90/110 30/110 30/110 30/110 90/110];

Even though these matrices could be attempted to be computed through the SMM.m, this program works better with the human voice, and thus for better accuracy, manually create the matrices of the original tunes.

8) Now that you have both the variables PHz and OHz in your MATLAB workspace, run MFE.m. This program will output the Time Error (seconds), Note Interval Error (semi- tones), and Number of Notes Added (+)/Deleted (-). The time error and note interval error is saved in the variable BR. The number of notes added or deleted is saved in the variable Notes added or deleted.

[y,fs] = audioread(filename);

song = y(:,1);

clear y

The SMM.m Code

In this section, the MATLAB code SMM.m which encodes a sung melody into a matrix, is described. In a melody, two notes never overlap. Therefore, the mathematical objects that characterize the melody are the frequencies and time durations of the notes. Thus, a matrix of two rows suffices where one row is reserved for the frequencies of the notes and the other row is reserved for the corresponding time durations. To identify this matrix of two rows, the program SMM.m first cuts the sung melody into intervals where each interval contains a single note and then computes the median frequency and time length in each of those intervals. The input of the SMM.m program is an audio file. Acceptable formats are .wav, .ogg, .flac, .au, .aiff, .aif, .aifc, .mp3, .m4a, .mp4. The program has been tested extensively with 1,269 recordings of human singing, each of length up to 3.6 seconds. Only one channel of the audio recording is considered.

[y,fs] = audioread(filename);

song = y(:,1);

clear y

The properties of the formations created by the amplitude graph are usually very oscillatory, as can be seen in Figure 3.

neurological-disorders-amplitude-graph

Figure 3: The highly oscillatory graph of the amplitude graph of the sung melody of a post- stroke patient can be seen here.

Therefore, the amplitudes are first converted into decibels for better readability. Figure 4 shows the graph of the sung melody of a post-stroke patient, converted into decibels and shifted above the x- axis for better image processing.

neurological-disorders-sung-melody

Figure 4: The graph of the sung melody converted into decibels and shifted above the x-axis is shown here. The decibels graph is not as oscillatory as the amplitude graph.

decibelsSong = mag2db(song);

decibelsSong(decibelsSong == -Inf) = NaN;

decibelsSong(isnan(decibelsSong)) = min(decibelsSong);

decibelsSong = decibelsSong -min(decibelsSong);

During singing, at the point of the passage from one note to the next, the loudness of the human voice usually dips. Unfortunately, the decibels graph is still slightly oscillatory and thus obscures the voice dips which occur when the post-stroke patient changes notes. To reveal and exemplify the voice dips which occur at the note changes, a simple graph is constructed which outlines the graph of the decibels. This simple graph is called an envelope. The envelope should not be prone to err from oscillations created by noise. There exist signal enveloping functions in MATLAB such as peak, Hilbert, and rms (Root-Mean- Square), but none performed well when tested on post-stroke singing. Also, currently existing voice enveloping functions that were tested, erroneously signified sudden unintended sounds such as taps or door slams. Here a novel envelope function is presented called Visual Audio Signal Envelope (VASE) that uses visual cues of an audio signal’s graph to identify a rough outline (envelope) of the decibels graph. VASE is described next. First the picture of the decibels graph is extracted with a line thickness of 1 without the axes. The reason for thickening the lines is to fill the spaces that are between the almost vertical lines of the decibels graph.

image

Then this image is blurred with a motion filter at a 45° to “fill” small oscillatory dips that occur in the graph. Next, a convolution of the blurred picture is taken which further blurs the borders of the already blurred object. Finally, the contour of tis blurred object is drawn and saved in a variable called contour Data.

image

Then, x in frames (which is a time unit) and y in decibels on the positive axis are computed from the variable contour Data. This task is tedious because one must ensure that the x coordinates of the envelope are in increasing order and do not repeat. However, the dimensions of the contour are proportional to the original decibels graph and therefore can be scaled appropriately (Figure 5).

neurological-disorders-VASE

Figure 5: The graph of the decibels above the x-axis and the Visual Audio Signal Envelope (VASE) is shown here. VASE is graphed by plotting the variable y against the variable x.

image

VASE does not erroneously outline sharp unintended sounds as Figure 6 shows.

neurological-disorders-signal-graph

Figure 6: Another signal graph with its corresponding VASE is shown here. The sharp un-intended door slam approximately at frame 0.4 × 105 is not effecting VASE.

This completes the description of VASE. Then the local minima of the envelope are found. These are controlled by a min separation parameter of 50 frames and a min prominence parameter of 1. The locations of these minima are the points when the patient changes notes.

Finally, the time matrix is defined, which contains the beginning and ending in frames of each note sung. In each interval, the fundamental frequency (f0) is estimated. Using MATLAB’s pitch function from the Audio System Toolbox, this task is done by utilizing the Pitch Estimation Filter (PEF) [19]. The frequencies and time intervals of each note are then saved as the first and second rows of the final matrix, respectively. In the program below, the matrix created from the patient audio is called PHz.

image

The MFE.m Code

In this section, the MATLAB code MFE.m which gives a fair (and consistent) assessment of the attempt of a post-stroke patient to repeat a certain original tune, is described. The inputs of MFE.m are two matrices: the PHz matrix (produced by the SMM.m code described in the previous section) and the OHz matrix (produced manually) containing the frequencies and time durations of the tune and the patient recording respectively. In order to change the subjective matter of the assessment of a sung melody into an objective matter, the following rules are adapted:

1. The patient should not be penalized for not singing on the exact pitch as long as s/he produces the correct semitone intervals.

2. The patient should not be penalized if he/she sings at a faster or slower tempo as long as the patient preserves the correct ratios of note durations.

3. If the patient produces the original tune correctly with added notes, then for computing the error of the patient, the time durations of the original song and the note intervals when the patient was adding notes are both set equal to zero.

4. Likewise, if the patient reproduces only a subsection of the original tune, then it is assumed that the notes the patient failed to produce have time durations and note intervals containing at least one missed note are both equal to zero.

First, the MFE.m program converts the frequencies into notes by corresponding A4 to 0.

image

There are three cases which are based on the number of notes the patient makes relative to the number of notes of the original tune.

Case 1: If the patient produces the same number of notes as the original tune, first rescale uniformly the durations of the time intervals of the patient’s recording to minimize the Euclidean distance between the original durations and the uniformly scaled durations of the patient. Given two time vectors (T1, . . . , Tk) and (t1, . . . , tk) of same length and non-negative coordinates, then by the Pythagorean Theorem it can be seen that the uniform rescaling factor x of (t1, . . . , tk) that makes the Euclidean distance (T1 − xt1)2 + · · · + (Tk − xtk)2 minimum, is given by x = (T1t1 + · · · + Tktk)/(t2 + · · · + t2). This uniform rescaling of the time intervals of the patient ensures that the patient will not be penalized if he/she sings at a different tempo than the original song, as long as the correct ratios of the durations of the individual notes of the original tune are kept. After this time interval rescaling, compute the Euclidean distance of the uniformly rescaled time intervals of the notes of the patient’s recording from the time intervals of the original tune. This is the time error of the patient. Also compute the Euclidean distance of the note intervals of the patient’s recording from the note intervals of the original tune. This is the note interval error of the patient.

image

Case 2: If the patient produces more notes than the original tune, then first select a submatrix of the matrix of the patient that has the same number of columns (each column corresponds to one note made) as the original and minimizes the Euclidean distance of the note interval error. Second, scale the time intervals of the matrix of the original tune to minimize the Euclidean distance of the time intervals between these two matrices of the same size. Third, enlarge the matrix of the original tune by adding auxiliary columns where the patient added notes. Each such auxiliary column contains a zero for the time interval and contains the same tone as the column on its left. Now the augmented matrix of the original song and the matrix of the patient’s recording have the same size, so the Euclidean distance of their top rows gives the note error, and the Euclidean distance of their bottom rows gives the time duration error. To understand the mentioned augmentation in the third step, assume for example that the patient produces the matrix

image

While the matrix of the original song was merely

image

(here N1 ,N 2 ,. . stand for notes and T1 ,T 2 ,. . stand for time durations). Then the matrix of the original song is augmented to

image

The note error becomes

image

The time error becomes

image

image

Case 3: If the patient produces fewer notes than the original tune, then first select a submatrix of the matrix of the original tune that has the same number of columns (each column corresponds to one note made) as the matrix of the patient’s recording and minimizes the Euclidean distance of the note interval error. Second, scale the time intervals of the matrix of the patient’s recording to minimize the Euclidean distance of the time intervals between these two matrices of the same size. Third, enlarge the matrix of the patient recording by adding auxiliary columns where the patient missed notes. Each such auxiliary column contains a zero for the time interval and contains the same tone as the column on its left. Now the augmented matrix of the patient’s recording and the matrix of the original tune have the same size, so the Euclidean distance of their top rows gives the note error, and the Euclidean distance of their bottom rows gives the time duration error.

image

Quality Control

To check that VASE has faithfully created an envelope of the decibels graph of the patient’s sung melody, run the program graphing.m, which can be found in GitHub [2]. The SMM.m code has been tested in 1064 patient wav files and their envelopes have been checked using graphing.m with very satisfactory results.

System requirements

Operating system: macOS: El Capitan (10.11) Wisndows: Server 2012

Ubuntu: 14.04 LTSx

Debian: 8

Red Hat: Enterprise Linux 6 (minimum 6.7) SUSE: Linux Enterprise Server 12 (minimum SP2)

Programming language: MATLAB 2018a

Additional system requirements: Minimum processor: Any Intel or AMD x86-64 processor

Recommended disk space: 4-6 GB

Minimum RAM: 4 GB

Dependencies: Image Processing Toolbox Audio System Toolbox

Software location

Code repository

Name: GitHub

Location: https://git.io/fx8rp

License: BSD 3-Clause ”New” or ”Revised” License

Date published: October 23, 2018

Conclusion

The significance of the presented MATLAB codes is that they can be easily implemented in other stroke research labs to evaluate the singing abilities of post-stroke patients. At the time of this writing, the computer programs described here are being used in an ongoing study to examine correlations between brain lesions and melodic repetition errors. Future studies to combine these computer programs with neural networks to investigate the correlation between melodic repetition errors and treatment recovery are of great interest.

Acknowledgements

I would like to thank Professor Fridriksson, head of the Aphasia Lab at the University of South Carolina, and the members of his lab (who can be found here: https://web.asph.sc.edu/aphasia/members/) for their support, help, and stimulating discussions during this research.

References

Citation: Androulakis A (2019) Encoding and Assessing Sung Melodies in Stroke Patients with Aphasia J Neurol Disord 7: 404. DOI: 10.4172/2329-6895.1000404

Copyright: © 2019 Androulakis A. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Select your language of interest to view the total content in your interested language

Post Your Comment Citation
Share This Article
Relevant Topics
Article Usage
  • Total views: 332
  • [From(publication date): 0-0 - Aug 24, 2019]
  • Breakdown by view type
  • HTML page views: 298
  • PDF downloads: 34
Top