A Nonlinear Transmission Line Model of the Cochlea With Temporal Integration Accounts for Duration Effects in Threshold Fine Structure

Summary For normal-hearing listeners, auditory pure-tone thresholds in quiet often show quasi periodic ﬂuctuations when measured with a high frequency resolution, referred to as threshold ﬁne structure. Threshold ﬁne structure is dependent on the stimulus duration, with smaller ﬂuctuations for short than for long signals. The present study demonstrates how this e ﬀ ect can be captured by a nonlinear and active model of the cochlear in combination with a temporal integration stage. Since this cochlear model also accounts for ﬁne structure and connected level-dependent e ﬀ ects, it is superior to ﬁlter-based approaches and hence allows the investigation of the contributions of cochlear-and retro-cochlear processing on behavioural data, including stimulus-duration dependent e ﬀ ects of threshold ﬁne structure. © 2017 The Author(s). Published by S. Hirzel Verlag · EAA. This is an open access article under the terms of the Creative Commons Attribution (CCBY4.0) license (https://creativecommons.org/licenses/by/4.0/).


Introduction
Normal-hearing listeners often showaquasi periodic fluctuation in threshold levelw hen thresholds are measured with ah igh frequencyr esolution.This is commonly referred to as microstructure [1] or fines tructure [2] of the threshold in quiet.Other psychoacoustical effects such as binaural diplacusis have been argued to be linked to this threshold finestructure [3].Since aminimum in threshold finestructure often coincides with the frequencyofspontaneous otoacoustic emissions [4] it is assumed that the fines tructure and the spontaneous otoacoustic emissions originate from the same cochlear mechanism.An onlinear cochlear transmission-line model can predict these effects [5].
Threshold fines tructure depends on duration.Cohen [6] showed that the threshold versus duration curvew as steeper for as ignal with af requencyc oinciding with am inimum of the threshold fines tructure (trough frequency) than for as ignal with af requencyc oinciding with amaximum of the threshold finestructure (peak frequency).
Cohen [6] argued that the different slopes for the threshold curves were due to the energy spread associated with decreasing signal duration.According to this explanation, the energy spread due to the shortening of the signal to surrounding frequencies could not be used when the frequencyw as equal to at rough of the fines tructure since thresholds were higher for the surrounding frequencies.In contrast, for ap eak frequency, as the bandwidth widens with decreasing signal duration, the energy spread to frequencies for which the detection threshold waslower than that of the signal frequency, i.e., theycontributed to detection threshold.
This explanation assumes that each frequency( associated with as pecificc ochlear place)i sl inked to ac ertain sensitivity for this frequency.This can be realized by assuming af requency-dependent gain of the cochlear amplifier with ah igher gain at fine-structure minima than at maxima, i.e., afi ne-structure filter.H owever,fi ne structure effects on modulation detection thresholds are difficult to reconcile with this linear filter approach.Modulation detection thresholds were higher for ac arrier frequency equal to the peak of the threshold finestructure than for a carrier frequencyequal to the trough of the threshold fine structure [7].The size of the effect depended on the carrier levelb eing highest when the carrier levelw as close to the threshold in quiet.This level-dependence cannot be predicted on the basis of alinear filter approach.
Thus, am ore realistic model of cochlear mechanics is required.Epp et al. [5] showed that ar ealistic cochlear model predicts fine structure effects, including those found in modulation detection experiments.
The present study used the cochlear model of [5] as a basis for the prediction of the effect of duration on temporal fines tructure.The aim of the present study wast o investigate the ability of the model to account for duration effects in threshold finestructure, and to what extent the effect wascochlear or retro-cochlear in origin.To this end, experimental data on threshold fine structure measured in dB SPL were compared to model predictions using the cochlear model in it'soriginal implementation, and in combination with aleakyintegrator as aretro-cochlear process.

Psycoacoustical experiment
Six normal-hearing subjects with audiometric thresholds < 15 dB HL at octave frequencies between 0.125 kHz to 8k Hz participated in the experiments.Subjects were seated in adouble-walled sound attenuating booth.Signals were presented via Sennheiser HDA200 headphones.Before measuring thresholds as afunction of stimulus duration, the individual fine structure wasmeasured in various octave bands until ar egion showing sufficient fines tructure wasdetermined for each subject.As ascreening procedure, the modified vonB ékésy tracking method of [2] wasused.Adjacent peak and trough frequencies between 1and 3kHz were selected for each subject and used in the following temporal-integration experiments.
In the temporal-integration experiment, thresholds in quiet were measured using an adaptive three-interval three-alternative forced-choice paradigm with an one-up two-down rule [8].Consecutive intervals of at rial were separated by 400 ms of silence.One randomly chosen interval contained the signal.The task of the subject wasto indicate the interval containing the signal.Depending on the response of the subject, the signal levelf or the next trial wasincreased or decreased.The initial step size was 6dB.After the first upper reversal it wasreduced to 3dB and after the second upper reversal to 1dB.The adaptive run continued for another eight reversals with this minimum step size.The mean of the levels at these eight reversal wastaken as threshold estimate.The final estimate of the individual threshold wasthe mean of the threshold estimates of three runs.

Simulations
An onlinear and active transmission line model of the cochlea with 1000 segments representing discrete cochlear partitions (CP) wasu sed [5].The active process in the cochlea wasi mplemented as ac ombination of velocitydependent damping and feedback stiffness in the equation of motion for each CP.N ote that the implementation of the model in its current form only phenomenologically describes the macromechanics of the BM and does neither model, nor allowt oi nfer physiologically realistic details about cochlear micromechanics.To account for finestructure effects, arandom variation of the place-frequencymap (roughness)w as introduced.This approach leads to selfsustained oscillations of the CPs and has successfully been used to predict finestructure of the threshold in quiet [5].To this end, av elocity threshold wasa ssumed.When the threshold wasr eached at anyC P, the excitation wasc onsidered to be sufficiently strong to detect the signal.This "velocity-threshold model" wasused here to investigate to what extent cochlear processing contributes to the effect of stimulus duration.In addition to the velocity-threshold model, four modified versions of the model (referred to as "integrator models")w ere used.In the integrator models, atemporal integration stage wasadded to process the nonlinearly transformed cochleogram, i.e., the temporal output of each CP of the simulated cochlea overt ime: the absolute value of the cochleogram wasr aised to an exponent α,s ummed overa ll CPs c j and low-pass filtered using an integration windoww ith at ime constant τ.T he four versions of the integrator model differed with respect to the time constant of the integration window( τ,1 00 or 200 ms)a nd the exponent used to process the absolute value of the segment velocity (α,2o r3 ).The two time constant have both been used for modeling of psychoacoustical data [9].Temporal integration models commonly assume an integration of the intensity (i.e., α = 2).Some authors, however, argue that an exponent of 3p rovides abetter prediction of temporal integration data [10].To showthe influence of this parameter on the predictions, both values were included in the simulations.The summation overthe output of all CPs c j represents an acrossfrequencyi ntegration.The temporal windoww as implemented as alow-pass filter with an exponentially decaying impulse response, The impulse response w τ (t)w as sampled and truncated to al ength of fivet imes the time constant τ to obtain the discrete impulse response w τ [n].Low-pass filtering was realized as where c τ,α [n]respresents the output of the integrator, c j [n] the velocity of the CP with index j, w τ [n]t he integration windoww ith time constant τ.T he operator * denotes filtering including appropriate initial conditions to avoid onset effects in the filtering process.The initial conditions were obtained as the final state of the filter after processing ao ne-second period of self-sustained activity of the simulated CPs (without external stimulation).The thresholds in quiet were simulated in each of the four investigated integrator models by assuming that astimulus signal was detected if: i.e., if the value of c τ,α [n]a pproached ac ritical value κ within at olerance of δ = 3% within the duration T of the stimulus.The value of κ wasderivedempirically as a scaled version of the output of the model integration stage after processing aone-second period of self-sustained activity (without external stimulation).The thresholds of the models were determined by varying the stimulus leveli n steps of 0.25 dB until the criterion value κ wasr eached.
The signals were the same as in the psychoacoustical experiment.Thresholds are givenindecibels sound pressure level(dB SPL)o fthe signal prior to signal gating.Forall subjects, thresholds decreased as the duration increased.The effect waslarger for the trough frequencythan for the peak frequency, in qualitative agreement with the data of [6].Individual differences were observed with respect to the size of the effect and the difference between the twot hreshold curves.The maximum threshold difference between the twofrequencies ranged from 6dB(S6)to11dB(S1).Panel Ao fF igure 2s hows the average thresholdduration curves for the six subjects.The mean threshold for the trough and the peak frequencies were about the same (14d B) at as ignal duration of 8m s.Fort he peak frequency, thresholds decreased by about −2dBper doubling of the duration down to 2d B. In the range from 8 to 32 ms, the slope of the threshold curvef or the trough frequencywas considerably steeper (about -4 dB per doubling of duration)t han for the peak frequency.As imilar slope for the threshold curves of the twofrequencies was observed for durations larger than 64 ms.The minimum threshold was −4.5dBfor the trough frequencyand aduration of 512 ms.Atwo-way ANOVA (SPSS)was used to examine the effect of stimulus duration ("duration", seven levels)and position with respect to the threshold finestructure ("position", twolevels: peak or trough).The effect of "duration" on the thresholds wass ignificant (F (6,3 0) = 488.3,p < 0.001).The factor "position" also had asignificant effect on the thresholds (F(1,5) = 30.4,p < 0.01).There wasas tatistically significant interaction between the effects of interaction of "position" and "duration" on the thresholds (F(6,30) = 21.8,p< 0.001), i.e., the position with respect to the threshold fine structure had asignificant effect on the shape of the threshold curve.

Results
Panel Bo fF igure 2s hows simulated thresholds using the velocity-threshold model.Thresholds for the peak and  trough frequencies were about the same for the 8mssignal duration.Forthe peak frequency, thresholds were constant for durations of 16 ms and longer.For the trough frequency, thresholds decreased up to ad uration of 128 ms and did not depend on duration for longer signals.The difference in thresholds between the peak and the trough frequencies wasd etermined by the simulated fines tructure at these frequencies.Panels Cand DofFigure 2show the predictions using the integrator models with values of α = 2(panel C) and α = 3(panel D) in combination with time constants of τ = 100 (upward pointing triangles)and τ = 200 ms (downward pointing triangles).The models predicted, consistent with the data, highest thresholds for short durations, and decreasing thresholds for increasing durations.The decrease in thresholds from 8msto512 ms duration waslarger for the trough frequencies than for the peak frequencies.The differences between thresholds for the trough and the peak frequencies at each duration increased with increasing stimulus duration, asymptotically approaching the value of the threshold fines tructure.For the exponent α = 2, the initial decrease between 8a nd 16 ms wasslightly steeper,and the decrease as afunction of duration showed al arger curvature than the measured data.The initial decrease in the predictions wasm ainly determined by the exponent, being shallower for an exponent of α = 3than for an exponent of α = 2.The curvature wasdetermined by the time constant τ.Alonger time constant led to ah igher decrease in threshold with durations between 16 and 512 ms than ashorter time constant.Simulated thresholds for atime constant of τ = 100 ms (light blue symbols)w ere slightly higher than for τ = 200 ms (dark blue symbols), and the predicted decrease of thresholds wassteeper for the models using α = 2(panel C) than for the models using α = 3(panel D).

Discussion
The simulations (cf. Figure 2) showed that differences in the slope between trough and peak frequencies at short durations (8 to 32 ms)w ere accounted for by the cochlear model.However, for the steady decrease of thresholds with increasing duration, ar etro-cochlear temporal-integration stage wasr equired.This difference in slope between trough and peak frequencyw as more pronounced in the simulations than it wasi nt he data.This might indicate that the used temporal integration stage waso versimplified.With the current choice of the integration windowand the values of α and τ,the combination of atime constant of τ = 200 ms and and α = 3provided the best fit to the data.
One should keep in mind that the current approach of simulating temporal intergation with one leakyi ntergator might not be realistic.Recently, [9] showed that temporal intergation of loudness may be better modelled by assuming twop arallel leakyi ntegrators instead of as ingle leakyintegrator.Some models of temporal integration did even not require al eakyi ntegrator,b ut used probabilistic theories of sound detectability that explained detection by aprobability accumulation overtime [11].Independent of the retro-cochlear processing, finestructure effects are inherent when using am odel as described in [5] and hence the combined model should be able to account for the data.
In summary,t he difference in the slope of the threshold curves for the peak-and trough frequencies could be explained by twoe lements: A) am echanical processing stage including inherent properties of ap hysiologically plausible cochlear model to account for duration-and frequencyd ependent spread of excitation; and B) retrocochlear temporal integration of the transformed output of the cochlea with at ime constant of about 100-200 ms to account for the monotonic decrease.

Figure
Figure 1shows the levelatthreshold as afunction of signal duration.Each panel shows individual data for the trough frequency( filled circles)a nd adjacent peak frequency (open circles)o ft he individual threshold fines tructure.Thresholds are givenindecibels sound pressure level(dB SPL)o fthe signal prior to signal gating.Forall subjects, thresholds decreased as the duration increased.The effect waslarger for the trough frequencythan for the peak frequency, in qualitative agreement with the data of[6].Individual differences were observed with respect to the size of the effect and the difference between the twot hreshold curves.The maximum threshold difference between the twofrequencies ranged from 6dB(S6)to11dB(S1).Panel Ao fF igure 2s hows the average thresholdduration curves for the six subjects.The mean threshold for the trough and the peak frequencies were about the same (14d B) at as ignal duration of 8m s.Fort he peak frequency, thresholds decreased by about −2dBper doubling of the duration down to 2d B. In the range from 8 to 32 ms, the slope of the threshold curvef or the trough frequencywas considerably steeper (about -4 dB per doubling of duration)t han for the peak frequency.As imilar slope for the threshold curves of the twofrequencies was observed for durations larger than 64 ms.The minimum threshold was −4.5dBfor the trough frequencyand aduration of 512 ms.Atwo-way ANOVA (SPSS)was used to examine the effect of stimulus duration ("duration", seven levels)and position with respect to the threshold finestructure ("position", twolevels: peak or trough).The effect of "duration" on the thresholds wass ignificant (F (6,3 0) = 488.3,p < 0.001).The factor "position" also had asignificant effect on the thresholds (F(1,5) = 30.4,p < 0.01).There wasas tatistically significant interaction between the effects of interaction of "position" and "duration" on the thresholds (F(6,30) = 21.8,p< 0.001), i.e., the position with respect to the threshold fine structure had asignificant effect on the shape of the threshold curve.Panel Bo fF igure 2s hows simulated thresholds using the velocity-threshold model.Thresholds for the peak and

Figure 1 .
Figure 1.Detection thresholds of pure tones with signal frequencies corresponding to peaks (open circles)and troughs (filled circles)o fthreshold finestructure.Each panel shows the detection thresholds for one subject and the intraindividual standard deviations.Apart from the subject abbreviation, each panel also contains the peak and trough frequencies for the subject.

Figure 2 .
Figure 2. Same as Figure 1but showing average detection thresholds and interindividual standard errors (panel A) and model predictions using different model versions (for details see text).The mean data of panel Ai sr edrawn in panels B-D (dashed lines).Forbetter comparison of the data and simulations, the mean data are normalized to the predictions for 512 ms duration and τ = 200 ms.