Wednesday, 28 July 2010

sound comparisons

Warren Maguire’s comments on yesterday’s blog reminds me that I have not previously written about the interesting Sound Comparisons website.

This is the showcase for a research project conducted at the University of Edinburgh in 2005–2007. The website offers you ‘sound comparisons’ for about a hundred English words pronounced in fifty or so different native-speaker accents, mainly but not exclusively British. They are presented in narrowish IPA transcription, and for many (but not all) there are sound clips. There’s no connected speech.

Unusually, there are also a dozen or so ‘historical’ accents/varieties covered, ranging from Proto-Germanic to Shakespearean. Strangely, no native speakers of these varieties seem to have been available to offer recordings.

There are also a dozen or so ‘other Germanic’ varieties, giving the cognates in the relevant languages of the items in the English word list. If you’ve always wanted to listen to the Frisian word for ear, this is where to find it. (It’s iˑər, which you could easily take as some kind of postGaelic Scottish.)

With my browser at least the sound clips are rather flaky: you tend to get two plays (perhaps overlapping) of the word you ask for, followed shortly or indeed after quite a time by other words seemingly chosen at random.

Presumably because the research money was not renewed, the website now gives the impression of having been abandoned. I hope this is not the case: it would be nice to have the missing sound files, e.g. for London or Norwich. It would be nice to have some connected speech. It would be nice to have more accents represented.

Perhaps Warren Maguire’s ongoing research will fill some of these gaps.

15 comments:

  1. The trouble seems to be that the files play when one hovers the cursor over the transcription, no need to click on it.

    I see the Danish 'soft d' is transcribed as [lʔ]. I keep hearing that it supposedly sounds like an /l/ to foreigners, but I cannot for the life of me hear it.

    Something also seems odd about the vowel qualities. Unless my English pronunciation is completely lopsided of course.

    The speaker sounds native at least.

    ReplyDelete
  2. I must speak up for the site. For a while I got poor quality and slow speed onSafari and had to use my unpreferrred Firefox. But then they improved it hugely.

    As Sili says, the trick is to hover rather than click, and to be careful where the cursor goes immediately after.

    ReplyDelete
  3. I agree with you, John - that was an appalling design decision, to have the sound files play when you roll the mouse over the link.

    Some of the transcriptions are pretty flaky too. Looking at the Boston sample, for instance, the transcriber hasn't done a good job at all in distinguishing rhotic and non-rhotic instances of the vowel at the end of brother etc.

    None the less, a useful resource just for being able to hear the sound clips. I hope they get to finish the site.

    ReplyDelete
  4. Something that confused me at first was the discrepancy between the transcriptions and the recordings in a few cases. Here is the explanation:

    A few of the transcriptions are based on the speech of more than one speaker from the given location, so that not all transcriptions correspond exactly to the associated recording.

    ReplyDelete
  5. Thanks John for your discussion of our website, and for all the useful comments submitted. Regarding the current state of play with it, well it is still being managed (by Paul Heggarty, who created it and who is best placed to answer questions about functionality), but we've not been adding to it for the reason John has suggested. I think it would be great to expand it, but it would be quite a job, requiring funds we don't currently have, to collect the data, transcribe them, segment the soundfiles, and code the website.

    Regarding the issue of the mismatch between the soundfiles and the transcriptions, there's a number of reasons:

    1) Undoubtedly there are some transcriber errors and idiosyncrasies (I did all the transcriptions), most likely in varieties I am not at all familiar with (e.g. Danish); but hopefully that's not the main problem;
    2) The phonetic transcription was done a particular way so that it could be fed into a similarity measurement algorithm, so that there are some idiosyncratic features of it; but that's essentially an issue of representation;
    3) Some of the speakers, especially the ones giving the traditional dialect forms, gave more than one pronunciation of each word. Only one is given on the website, and since the transcriptions and the soundfile segmentation were done by different people, there's a chance of mismatch;
    4) The most important issue is, as Jongseong points out, that the soundfiles and the transcriptions don't necessarily refer to the same thing. This is a result of the transcription procedure, which wasn't initially done with a website in mind. For many of the localities, a number of speakers were recorded. For our dialect similarity study, we wanted a representative transcription from each location; sometimes this was the transcription for one typical speaker, other times it was a composite transcription which included the most common variants across a range of speakers. So, for example, in Boston I recorded about 12 or 13 people reading our wordlist. Three or four of these were older with very local accents, and we made a composite transcription of their most common variants which we labelled 'Boston Traditional'. Likewise, we made a composite transcription based on the speech of the other Boston speakers (who were almost all male, middle-aged and upper working-class) which we labelled 'Boston Typical'. These composite transcriptions are 'averages' of the speakers sampled so to speak. Towards the end of the project we felt it would be a nice idea to make a selection of our data available on the internet, and, with limited time and money, we matched up each of the transcriptions we used in our algorithm with one of the recordings it was based on. Which means they don't always match exactly (so for example the Boston soundfiles have some obviously non-rhotic pronunciations matched with transcriptions indicating rhoticity and vice versa), so it is best to think of the soundfiles and recordings as two different sets of data from a given location.

    I did also collect conversational data at quite a few of the locations, and at some point I may put samples of this up on my own website.

    Anyway, for further details of the research which gave rise to the Sound Comparisons website, the elicitation techniques and some details of the transcription procedure, have a look at this paper: Warren Maguire, April McMahon, Paul Heggarty and Dan Dediu (2010)
    "The past, present, and future of English dialects: Quantifying convergence, divergence, and dynamic equilibrium", Language Variation and Change, 22(1), pp 69-104. Comments and questions welcome!

    ReplyDelete
  6. Very cool! I also noticed in the Boston sample that the speaker used his PALM-START vowel in BATH which is quite exceptional coming from a North American. The transcription there is wrong too, but that's fine.

    ReplyDelete
  7. I also remember being surprised to hear a university lecturer, a Boston resident, consistently using a PALM vowel for can't. I don't remember hearing it from any other Bostonian, so it might be just him.

    ReplyDelete
  8. Well you don't hear the so-called "Broad A" in (a few) BATH words from many Bostonians these days. Especially not younger ones, as far as I know. You just have to find the right people.

    ReplyDelete
  9. I object to Standard Canadian [ˈdɔ ̞ˑɾɹ ̩] for 'daughter' and to my ears it doesn't even match the voice sample which is nonetheless authentically Canadian. How can it be [ɔ]?? I'm pronouncing it as I could have sworn most Canadians pronounce it... with [ɑ], just like many Americans. I'm sure that's a typo, even more sure because Australia:Perth 'daughter' is written with a flap but pronounced in the voice sample clearly as an aspirate alveolar stop.

    And having designed sites myself, I confirm that voice samples triggered by simple mouse hovering are just plain irritating, not only because merely moving the mouse causes an audial spray of garbage inadvertantly by the user (hence user confusion) but because these hover areas look exactly like links (more user confusion), causing a two-fold problem of echo (and predictable user anger). Some mistakes are honest mistakes but this is so horrible that it could only be by incompetence I'm afraid. That this obvious design flaw is funded by a grant makes me more irritated. Give me a grant and I'll fix it then. Bah.

    Nonetheless, the content could be highly useful if there was more content, more organized and regularly updated. I hate abandoned university sites because it makes it look like academics are only in it for fame and cheap grant money rather than to reach outside of their ivory tower and provide helpful educational services to the public. Grrr. Oh why oh why did you show me that dysfunctional site, lol! Now my day is ruined.

    ReplyDelete
  10. Glen
    Well to be fair it does mark the ɔ as lowered: [ˈdɔ̞ˑɾɹ̩], though I agree with you it is bizarre to think of it as any sort of ɔ. Some time ago I was told off by a US NS on here for using ɒ for NAm speakers who don't merge it with ɑ, and several of us agreed that that symbol would be a better choice for it acoustically and impressionistically speaking, and not only from the point of view of BrE speakers, but from the point of the IPA norms, though obviously not for NAm speakers who are used to seeing the unmerged phoneme represented as ɔ. More recently JW himself has said here that he considered using the phonetically more appropriate ɒ for it in LPD, but decided against it. I think he argued then that one should not multiply such superficial transcriptional distinctions, but I still think it's more of a heritage issue, like AmE oʊ vs BrE əʊ, although there are realizations on both sides of the Atlantic that overlap, and əʊ itself replaced the oʊ that was used to represent the old RP pronunciation.

    So I might have gone along with [ˈdɒˑɾɹ ̩], and that's what that sound file sounds like to me, but I am not a Canadian. However I have heard enough Canadian English to believe you when you say most Canadians pronounce it with [ɑ]. In fact I could have sworn a more tendentious oath than you: that I had never heard a Canadian in Canada or out of it who did not have the LOT-THOUGHT merger realized with something more closely resembling [ɑ] than [ɒ]. Perhaps I have now, or perhaps this sound file is of a posh Canadian who pronounces all his [ɑ]s like that!

    But I will concede that it's not very confidence-inspiring that the transcriptions of the test sound files of the word "right" on the home page of soundcomparisons, even if we only take the realizations of the final /t/, are a bit erratic: [rəit] for one that is strongly aspirated (aspiration, though not assibilation, is conscientiously shown on the others), and [ɹäˑɪt] for one with no [t] at all, barely even a gesture in the direction of a glottal stop.

    ReplyDelete
  11. Mallamb,

    Here in Canada, I will bet my life and the life of my entire family that unrounded [ɑ] is the standard, not [ɒ] in words like "law", "lot" and "thought". My perception is that [ɒ] sounds British-influenced (even snooty if put on for airs), although perhaps it exists in Newfoundland English which deviates quite a bit from the rest of Canada and has a strong Irish lilt.

    But [ɔ] is from some extra-terrestrial dialect that I don't recognize. ET phone home, people! ;o)

    I gather that the site's audio/transcription mismatch must be because people from representative dialect and language areas were simply told to pronounce the words naturally while the transcriptions had been written out and matched to the audio files separately. The less moronic way of doing this would be to collect the audio files and properly transcribe each one after actually listening to them.

    ReplyDelete
  12. Okay, I'm not crazy. Hooray! The Cambridge History of the English Language: English in North America (2001), p.428 confirms that the vowel in question is either [ɑ] or [ɒ] in Canadian English, although I just never hear [ɒ] from any native Manitoban. Perhaps this is because we freeze in the winter time and we can't be bothered to round our lips in minus 40 degree weather, hehe.

    ReplyDelete
  13. One last message, following up on my uncertainty about the speech of our fellow Newfies. If we check the audio files of three Newfoundlanders, there is no perceptible rounding of this vowel in question whatsoever. In fact, I would transcribe the pronunciation of "modern" of the 1st and 3rd speakers as [ˈmaɾɚn] with a very fronted [a] while the 2nd speakers sounds as if he's been mostly assimilated into the mainstream Canadian accent and uses [ɑ]. By comparison, I say [ˈmɑɾɚn].

    ReplyDelete
  14. Oxford Canadian Dictionary uses [ɒ] for the Cot/Caught vowel(s).

    ReplyDelete
  15. Furthermore, this speech sample has a lot of cot/caught vowel rounding:

    http://web.ku.edu/~idea/northamerica/canada/alberta/alberta.htm

    ReplyDelete