G-1SJL7 Edinburgh Associative Thesaurus SR
Included in collections
- Collection Networks
Files
Properties
- Order23219
- Size0
- Minimum degree0
- Maximum degree1108
- Diameter7
- Clique number5
- Connectedfalse
- Arcs325589.0
- File size4864
- Average degree28.045
- Strong components15466
- Weak components2
- Modes1
- Temporalfalse
- Multirelationalfalse
- Directeddirected
- Realtrue
- Genealogyfalse
- Multiple linesfalse
- Weightedtrue
- Minimum weight1.0
- Maximum weight91.0
- Loopstrue
This network is the second part of a 3-set network of the Edinburgh Associatice Thesaurus (EAT). Each arc connects stimulus X with response Y, N times.
This is the EATsr dataset. The other two parts are the EATrs and EATnew datasets.
This network should be equal to the transposed (mirror) version t(EATrs) of EATrs. However, this is not the case. There are some differences:
SR - t(RS):
999.BELLOW 1
t(RS) - SR:
30.=*= 17
ULCER.=*= 1
THIRTY.=*= 1
PERIOD.=*= 1
There were also 32 multiple lines. Since the weights on the parallel arcs were the same we treated them as duplicates and preserved only a single arc. The 'corrected' version is saved in EATnew.
Background:
The Edinburgh Associative Thesaurus is a set of word association norms showing the counts of word association as collected from subjects. This is not a developed semantic network such as WordNet, but empirical association data.
The traditional way to collect word association norms is to show or say a word to several people and ask them to say the word which first comes to their minds upon receiving the stimulus. The link established between the stimulus and the response is not semantically labelled (e.g. as synonym, antonym or by a case relation) and can only be regarded as an association.
The Edinburgh association norms were collected by growing the network from a nucleus set of words. Responses were collected to words in this nucleus set, then these responses were used to obtain further responses, and so on. In fact the cycle was repeated about three times since by then the number of different responses was so large that they could not be re-used as stimuli. Data collection stopped when 8400 stimulus words had been used. Each stimulus word was presented to 100 different subjects, each of whom received 100 words. This gave rise to a total of 55732 nodes in the Thesaurus network.
The subjects were mostly undergraduates from a wide variety of British universities. The age range of the subjects was from 17 to 22 with a mode of 19. The sex distribution was 64 per cent male and 36 per cent female. The data was collected between June 1968 and May 1971.
The database consists of two files. The SR (stimulus-response) file, and the RS (response-stimulus) file. Where words have been truncated to 19 characters to save space the per cent character (%) has been placed as the 20th.
The EAT here is the one that is included in the MRC Psycholinguistic Database, for use with the other measures available there.
EAT Data Collection Procedure
Stimulus words
Since the objective was to obtain a reasonably large complete mapping of the associative network for a large set of words, a systematic procedure of 'growing' the network from a small nucleus was followed. At first responses were obtained from this nucleus set, then these responses were used as stimuli to obtain further responses, and so on. In fact, this cycle was repeated about three times, since by then the number of different responses was so large that they could not all be re-used as stimuli.
The nucleus set was derived from (a) the 200 stimuli used in the Palermo and Jenkins (1964) normq (b) the 1,000 most frequent words of the Thorndike and Lorge (1944) word frequency count and (c) the basic English vocabulary of Ogden (1954).
Data collection was stopped when 8,400 stimulus words had been used. Only a minimal amount of selection of stimuli was applied in each cycle of the data collection. Effectively all responses which were English words or meaningful verbal units were included, including some phrasal forms and numerals. The data cover a wide range of grammatical form classes and inflexional forms.
Procedure
Each stimulus word was presented to 100 different subjects. Each subject recieved a computer-printed sheet with 100 stimuli in randomised arrangement (to minimize priming effects). The total contribution of each subject was thus 100 responses. The verbal environment of each word for each subject was different. The instructions asked the subject to write down against each stimulus the first word it made him think of, working as quickly as possible. the total time spent on this task was measured, and most subjects completed the sheet in five to ten minutes.
Most of the data was collected in a classroom setting under supervision. Sheets which had more than 25 percent blank responses were rejected and fresh data was collected.
History:
- Original EAT: George Kiss, Christine Armstrong, Robert Milroy and J.R.I. Piper (collected between June 1968 and May 1971).
- MRC Psycholinguistic Database Version modified by: Max Coltheart, S. James, J. Ramshaw, B.M. Philip, B. Reid, J. Benyon-Tinker and E. Doctor; made available by: Philip Quinlan.
- The present version was re-structured and documented by Michael Wilson at the Rutherford Appleton Laboratory in 1988 (2).
- Transformed in Pajek format: V. Batagelj, 31. July 2003.
- Combined RS and SR versions, removed duplicates: V. Batagelj, 12. August 2013.
References:
- Kiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973) An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W. and Hamilton-Smith, N. (Eds.), The Computer and Literary Studies. Edinburgh: University Press.
- The present version of The Edinburgh Associative Thesaurus
- WordNet
- MRC Psycholinguistic Database
- Coltheart, M. (1981) MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology, 3A, 497-505.
- http://vlado.fmf.uni-lj.si/pub/networks/data/dic/eat/Eat.htm
- Download MRC Psycholinguistic Database 2