Auditory tests for characterizing hearing deficits: The BEAR test battery

Introduction: The Better hEAring Rehabilitation (BEAR) project aims to provide a new clinical profiling tool, a test battery, for hearing loss characterization. Whereas the loss of sensitivity can be efficiently measured using pure-tone audiometry, the assessment of supra-threshold hearing deficits remains a challenge. In contrast to the classical 'attenuation-distortion' model, the proposed BEAR approach is based on the hypothesis that the hearing abilities of a given listener can be characterized along two dimensions reflecting independent types of perceptual deficits (distortions). A data-driven approach provided evidence for the existence of different auditory profiles with different degrees of distortions. Design: Eleven tests were included in a test battery, based on their clinical feasibility, time efficiency and related evidence from the literature. The tests were divided into six categories: audibility, speech perception, binaural processing abilities, loudness perception, spectro-temporal modulation sensitivity and spectro-temporal resolution. Study sample: Seventy-five listeners with symmetric, mild-to-severe sensorineural hearing loss were selected from a clinical population. Results: The analysis of the results showed interrelations among outcomes related to high-frequency processing and outcome measures related to low-frequency processing abilities. Conclusions: The results showed the ability of the tests to reveal differences among individuals and their potential use in clinical settings.

binaural processing, and spectro-temporal resolution, as well as a test of cognitive 51 abilities. Importantly, while the auditory domains considered in the BEAR test battery 52 are similar to the ones considered in the HEARCOM project, the BEAR project aims to 53 additionally classify the patients in subcategories and to create a link between hearing 54 capacities and hearing-aid parameter settings. 55

56
The tests included in the BEAR test battery were chosen based on the following criteria: 57 1) There is evidence from the hearing research literature that the considered test is 58 informative and reliable; 2) The outcomes of the test may be linked to a hearing-aid 59 fitting strategy; 3) The outcome measures are easy to interpret and to explain to the 60 patient; 4) The task is reasonably time-efficient or can be suitably modified to meet this 61 requirement (e.g., by changing the test paradigm or developing an out-of-clinic 62 solution); 5) The test implementation can be done with equipment available in clinics; The selected test battery included measures of audibility, loudness perception, speech 67 perception, binaural processing abilities, spectro-temporal modulation (STM) sensitivity 68 and spectro-temporal resolution. It was implemented and tested in normal-hearing (NH) 69 and hearing-impaired (HI) listeners. The goals of the study were: 1) To collect reference 70 data from a representative sample of HI listeners for each of the selected tests, 2) to 71 analyse the test-retest reliability of these tests, 3) to analyse the relationships between 72 the different outcome measures, and 4) to propose a version of the test battery that can 73 be implemented in hearing clinics. 74

Analysis of test reliability 98
The test-retest reliability of the test battery was assessed using intraclass correlation 99 coefficients (ICC; Koo & Li, 2016)

Speech perception in quiet 121
Methods 122 The word recognition score (WRS-4UFC) test was proposed as a systematic and self-123 administered procedure that allows the estimation of supra-threshold deficits in speech 124 perception in quiet. The speech material was the same as the one used for standard

Results and discussion 136
The HI listeners' SRTQ were, on average, 20 dB higher than the ones of the NH group. 137 The interquartile range for the HI group was about 19 dB whereas for the NH group it 138 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint

Speech perception in noise 148
The Hearing in Noise Test (HINT; Nilsson et al., 1994) is an adaptive sentence 149 recognition test carried out with speech-shaped noise. The following assumptions are 150 considered in HINT: 1) Speech materials made of meaningful sentences yield a steep 151 psychometric function; 2) Stationary noise with the same spectral shape as the average 152 spectrum of the speech material makes the speech reception threshold in noise (SRTN) 153 less dependent of the spectral characteristics of the speaker's voice. Furthermore, the 154 signal-to-noise ratio (SNR) between the target and masker is better defined across the 155 frequency range; 3) The SRTN is independent of the absolute noise level as long as the 156 noise level is above the "internal noise" level. Therefore, it is recommended to present 157 the noise at least 30 dB above the "internal noise". The internal noise is defined as the 158 sum of the SRT in quiet of the tested listener and the SRT in noise for NH listeners, for 159 a given speech material (Reinier Plomp, 1986). 160 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Methods 161
The Danish HINT was used as in Nielsen & Dau (2011) to obtain the SRTN. 162 Additionally, a 20-sentence list was presented at a fixed signal-to-noise ratio of +4 dB 163 and scored to obtain a sentence recognition score (SScore +4dB ). The presentation level of 164 the noise was set between 65 and 85 dB SPL to ensure that the noise was always 165 presented 30 dB above the individual PTA. Each ear was tested individually. All 166 participants were tested using the same list with the same ear. However, for the test-167 retest reliability study, the list and ear presented were randomized, only using lists 6-10. 168

Results and discussion 169
The SRTN for NH listeners were, on average, 2 dB higher than the ones reported 170 Nielsen and Dau (2011). However, this might be explained by the fact that they used 171 diotic presentation which can lead to a 1.5 dB improvement (Plomp & Mimpen, 1979). 172 The results also showed a lower SRTN (1.5 dB) and higher SScore +4dB (4%) for the right 173 ear in both groups of listeners. According to Nielsen and Dau (2011), there was a 174 significant main effect of test list. Such differences are seen mainly for lists 1-4, which 175 were the lists used here. Therefore, the observed interaural difference can be ascribed to 176 a list effect. 177 The ICC values (SRTN: ICC= 0.61; SScore +4dB : ICC = 0.57) indicated only moderate 178 reliability of the HINT. The SRTN showed an SEM = 1.02 dB, which is below the step 179 size of the test (2 dB). The SScore +4dB showed an SEM value of 7.94%, which 180 corresponds to an error in one of the sentences. 181 The use of speech-in-noise tests can be a useful tool for the characterization of the 182 listener's hearing deficits that can be performed under different conditions, including 183 monaural, binaural, unaided and aided stimuli presentations. While here the tests were 184 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. uncomfortable level (50 CU) and the hearing threshold (0.5 CU). Low-frequency (LF) 205 average corresponds to frequencies below 1.5 kHz, high-frequency (HF) average 206 correspond to frequencies above 1.5 kHz 207 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Spectro-temporal modulation sensitivity 222
A speech signal can be decomposed into spectral and temporal modulations. While 223 speech-in-noise perception assessment leads to some confounds due to the variety of 224 speech corpora, noise maskers, and test procedures that can all affect the results, the 225 assessment of the sensitivity of simpler sounds might be of interest for characterizing a 226 listener's spectro-temporal processing abilities. Bernstein et al. (2013) showed 227 significant differences between NH and HI listeners for detecting STM in random noise. 228 These differences corresponded to specific conditions that were also useful for the 229 prediction of speech-in-noise performance in the same listeners. Lately, the assessment 230 of STM sensitivity in these specific conditions gained an increasing interest due to its The NH listeners showed a high sensitivity in the low-frequency condition (d' = 2.6) 255 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Binaural processing abilities 276
Binaural hearing is useful for sound localization and the segregation of complex sounds 277 (Darwin, 1997). Interaural differences in level or timing are processed for spatial 278 hearing purposes in the auditory system. With hearing loss, the neural signal at the 279 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  The frequency threshold (IPDfmax) was obtained from the average of two runs. 302 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint bilaterally. Ten diotic and 10 dichotic pitch contours, embedded in the noise, had to be 305 detected by the listener. The tones forming the pitch contours were generated by adding 306 frequency-specific IPDs to the presented noise (Cramer & Huggins, 1958). The 307 outcome measure of the binaural pitch test was the percentage score averaged across 308 two repetitions (BP20). 309 The IPDfmax test showed excellent reliability (ICC = 0.95; SEM = 65.4 Hz), and the 314 median time needed for two repetitions was 10 minutes. This suggests that IPDfmax is a 315 reliable measure of binaural processing abilities that can reveal substantial variability 316 among both NH and HI listeners, which is valuable for highlighting individual 317 differences among patients. 318

Results and discussion
The overall results from the binaural pitch test for the NH listeners showed >87.5% 319 correct detection, whereas the HI listeners' results showed a higher variability with an 320 interquartile range from 70-100%. The test showed excellent reliability (ICC = 0.98; 321 SEM = 4%). Listeners reported a positive experience due to the test being short and 322 easy to understand. 323

Extended audiometry in noise (eAUD) 324
The extended audiometry in noise (eAUD) is a tone detection test intended to assess 325 different aspects of auditory processing by means of a task similar to pure-tone 326 Sanchez-Lopez et al. (2020) audiometry. The tone is presented either in noise or in quiet and the listener has to 327 indicate whether the tone was perceived or not. The aspects of auditory processing 328 assessed here are 1) high-frequency audibility, 2) spectral and temporal resolution and 329 3) binaural processing abilities. 330

High-frequency audibility 331
Recently, elevated thresholds at high frequencies (>8 kHz) have been linked to the 332 concept of "hidden hearing loss" and synaptopathy ( typically broader auditory filters leading to impaired frequency selectivity (e.g. Moore, 346 2007). Temporal resolution can be characterized by the ability to "listen in the dips" 347 when the background noise is fluctuating based on the so-called masking release 348 (Festen & Plomp, 1990). Schorn & Zwicker, (1990) proposed an elaborated technique 349 for assessing both spectral and temporal resolution using two tests: 1) Psychoacoustical 350 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 2) eAUD-S: The tone is embedded in a TEN that has been shifted up in frequency. 368 In the spectral domain, this yields spectral unmasking of the tone, so the detection 369 threshold is lower than in eAUD-N. showing temporal masking release (TMR) and spectral masking release (SMR). 379

Binaural Masking Release 380
Besides the binaural tests presented previously, another approach for evaluating the 381 binaural processing abilities is assessing binaural masking release (Durlach, 1963), 382 which has been used in several studies (Neher, 2017;Strelcyk & Dau, 2009) and 383 implemented in some commercial audiometers (Brown & Musiek, 2013). In this 384 paradigm, a tone-in-noise stimulus is presented in two conditions: (1) a diotic condition 385 where the tone is in phase in the two ears, and (2) a dichotic condition where the tone is 386 in antiphase in the two ears. The difference between the two yields the benefit for tone 387 detection due to binaural processing, the so-called binaural masking release (BMR). 388 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The procedure used here was a yes/no task using a SIAM procedure ( Kaernbach, 390 1990). As in traditional up-down procedures, the target can be presented in a given trial 391 or not. If the target was detected, the target-presentation level is decreased according to 392 a given step size; if it was not detected, the level is increased. If the stimulus was not 393 presented (catch trial) but the listener provided a positive response, the level is 394 decreased compared to the previous trial. 395 The target stimulus for all the conditions tested here was a warble tone. For each run, 396 the first two reversals were discarded, and the threshold of each trial was calculated as 397 the average of the four subsequent reversals. The low-frequency condition (LF) 398 corresponds to the detection of a 0.5-kHz warble tone, whereas the high-frequency (HF) 399 condition corresponded to a 2-kHz warble tone. The final threshold was calculated as 400 the mean threshold of two repetitions. The outcome measures of the eAUD are 1) the 401 high-frequency threshold (eAUD-HF), 2) the tone-in-noise threshold (eAUD-N), 3) the 402 SMR, 4) the TMR, and 5) the BMR. 403

Results and discussion 404
The maximum frequency threshold for a tone presented at 80 dB SPL (eAUD-HF) was 405 11 kHz for the NH listeners and 8 kHz for the HI listeners. The HI group showed larger 406 variability compared to the NH group (interquartile range: 6 kHz vs. 10 kHz). In 407 contrast, the eAUD-N condition showed a larger variance for the NH group (SD = 4.5 408 dB HL) at low frequencies. The detection thresholds were in line with previous work 409 with thresholds close to the noise presentation level (70 dB HL) (Vinay, Hansen, Raen, 410 & Moore, 2017). The TMR shown by the NH group was larger at high frequencies (10 411 dB) than at low frequencies (7 dB). The HI group showed, on average, similar TMR 412 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. only at low frequencies. The SMR shown by the NH listeners was 19 dB for low 413 frequencies and 26 dB for high frequencies. In contrast, for the HI listeners, the SMR 414 was 7 dB lower only in the high-frequency condition. The BMR shown by both groups 415 was around 15 dB, as expected from previous studies (Durlach, 1963). 416 The reliability of the eAUD was moderate for most of the conditions (ICC < 0.75). The 417 eAUD-HF test showed very good reliability (ICC = 0.89; SEM = 495 Hz), and the 418 eAUD-S at low frequencies showed good reliability (ICC = 0.85; SEM = 1.78 dB). The 419 masking release estimates showed good reliability only for the high-frequency 420 condition. The reason for this might be that masking release is a differential measure, 421 and the cumulative error is, therefore, higher than that of each individual measure. The 422 reduced reliability can be explained to some extent by the method used. To have a 423 similar procedure as in pure-tone audiometry, the parameters of the SIAM tracking 424 procedure were set accordingly. However, this made the test challenging and the 425 listeners consistently missed several catch trials. Thus, extra trials were required to 426 improve measurement accuracy. However, the standard error of the measurement was in 427 most cases larger than the final step size (2 dB). As in the case of the fSTM, a different 428 procedure, such as Bayesian adaptive methods, might increase measurement reliability. 429

Exploratory analysis 430
The collection of tests included in the test battery was intended to explore different and 431 potentially independent aspects of hearing to obtain an auditory profile with controlled 432 interrelations among the tests. A factor analysis performed in the HEARCOM study 433 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.02.17.20021949 doi: medRxiv preprint The four factors resulting from the factor analysis showed 63% of explained cumulative 475 variance. The variables with higher loadings (> 0.65) for each of the factors are shown 476 in Table 3.

487
The first factor, in terms of the amount of variance explained (19%), was associated 488 with LF loudness perception and speech intelligibility in quiet, whereas the second 489 factor (18% of variance explained) was associated with HF loudness perception. 490 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Despite loudness perception being associated with the first and second factor, the MCL 491 was associated, both at high and low frequencies, with the third factor, while the fourth 492 factor was associated with speech intelligibility in noise. 493

General discussion 494
The first goal of the present study was to collect data of a heterogeneous population of 495 HI listeners, reflecting their hearing abilities in different aspects of auditory processing. 496 The current study was motivated by the need for a new dataset to refine the data-driven

Relationships across different aspects of auditory processing 507
The proposed test battery considers outcomes divided into six dimensions of auditory 508 processing. One of the objectives of the study was to investigate the interrelations of 509 different dimensions and measures. The present analysis showed two interesting 510 findings. First, the correlation analysis shows two clusters of variables related to either 511 low-or high-frequency audiometric thresholds. Speech-in-noise perception was 512 associated with high-frequency sensitivity loss, temporal, and spectral masking release 513 whereas speech-in-quiet was correlated with both low-and high-frequency hearing loss. Several outcomes were not interrelated, especially the outcomes associated with 515 binaural processing abilities. Second, factor analysis yielded latent factors related to 516 low-and high-frequency processing, most comfortable level and speech in noise. 517 Vlaming et al. (2011) showed four dimensions in the factor analysis of the HEARCOM 518 project data corresponding to high-and low-frequency spectro-temporal processing, 519 MCL and recruitment. In contrast, the current study showed that the slopes of the 520 loudness growth, both at low and high frequencies, were not interrelated and 521 contributed to the first and second latent factors. Additionally, the speech-in-noise test 522 performed in HEARCOM was associated with the low-frequency processing, whereas, 523 in the present study, speech-in-noise dominates the fourth factor and is significantly 524 correlated with high frequencies. The reason for this discrepancy might be the use of 525 different types of noise and test procedures in the two studies. 526 Overall, the data of the present study seem to be dominated by the audiometric profiles, 527 with low-and high-frequency processing reflecting the main sources of variability in 528 the data. However, binaural processing abilities, loudness perception and speech-in-529 noise outcomes showed a greater contribution to the variability of the supra-threshold 530 measures than spectro-temporal processing outcomes. 531

Towards clinical feasibility of the tests 532
The test-retest reliability of the test battery was investigated based on the results of a 533 subset of listeners who participated 2-5 months after the first visit. The analysis was 534 based on the ICC and the SEM. Some of the tests, such as IPDfmax, binaural pitch and 535 FLFT showed good to excellent test-retest reliability with all ICC values above 0.9, 536 while other tests, such as the extended audiometry in noise and speech intelligibility in 537 quiet, showed poor reliability. The analysis of the data showed that a reduced BEAR test battery has the potential for 557 clinical implementation, providing relevant and reliable information reflecting several 558 auditory domains. The proposed test battery showed good reliability, was reasonably 559 time-efficient and easy to perform. The implementation of a clinical version of the test 560 battery is publicly available and can be evaluated in future research, e.g. in a larger field 561 study to further refine the auditory profiling approach. Moreover, the current data will clinical implementation of the test battery is publicly available at 572 https://bitbucket.org/hea-dtu/bear-test-battery/src/master/. 573

Acknowledgements 574
We thank the staff from OUH, BBH and HEA, especially JH Schmidt, SS Houmøller, E 575 Kjaerbøl, RS Sørensen, and the student helpers from the MSc of Audiology at SDU. The 576 funding and collaboration of all BEAR partners are sincerely acknowledged. We also 577 want to show our gratitude to all the participants in the study. 578