Corpus info

General stats

Tokens966412
Words865315
Types23156
Lemmas15543
Hapax legomenon10830
Dis legomenon3487
POS tags496

Tokens = strings separated by white spaces (punctuation marks included).
Words = strings separated by white spaces (punctuation marks excluded).
Types = unique words (based on standardized spelling and case insensitive).

Documents

Number of documents725
Average (tokens per document)1333
Median (tokens per document)984
Longest document (tokens)15663
Shortest document (tokens)108
Oldest document (year)1600
Most recent document (year)1896

Group by part of speech

Main POS tagN%
common noun21328122.07
preposition15593016.13
determiner10597410.97
punctuation10109710.46
verb915829.48
numeral720217.45
conjunction704197.29
adjective453264.69
pronoun380463.94
proper noun340443.52
adverb269162.79
untagged114061.18
foreign word2720.03
interjection980.01
Total966412100.00

Group by project

ProjectN%
CORDEREGRA36557837.83
HISPATESD24759325.62
ALEA1817636918.25
CORTENEX16788317.37
ALEA1964430.67
VIVE24380.25
_1080.01
Total966412100.00

Group by text type

Text typeN%
inventory of goods69733472.16
witness statement19291719.96
medical certificate713027.38
other46400.48
OTH2190.02
Total966412100.00

Group by century

CenturyN%
XVIII57820459.83
XVII27068528.01
XIX11752312.16
Total966412100.00

Group by province

ProvinceN%
Granada16749217.33
Jaén11300911.69
Almería11205711.60
Badajoz11097311.48
Madrid751837.78
Cádiz727877.53
Burgos687937.12
Málaga681787.05
Cáceres612486.34
Sevilla330453.42
Huelva273202.83
Murcia148631.54
Valladolid127201.32
La Rioja53490.55
Cantabria49440.51
Toledo39820.41
Palencia37850.39
Zamora21500.22
Navarra20110.21
Álava16800.17
Soria14440.15
León13800.14
Córdoba8830.09
Gipuzkoa6050.06
Teruel2930.03
Salamanca2380.02
Total966412100.00

Group by institution

InstitutionN%
Archivo de la Real Chancillería de Granada23739624.56
Archivo Histórico Provincial de Badajoz10798111.17
Archivo Histórico Provincial de Jaén10546610.91
Archivo Histórico de Protocolos de Madrid732907.58
Archivo Histórico Provincial de Burgos666566.90
Archivo Histórico Provincial de Almería637746.60
Archivo Histórico Provincial de Cáceres599026.20
Archivo Histórico Provincial de Cádiz511015.29
Archivo Histórico de Protocolos de Granada495705.13
Archivo de la Real Chancillería de Valladolid445784.61
Archivo Histórico Provincial de Huelva265732.75
Archivo Histórico Provincial de Sevilla247022.56
Archivo Histórico Municipal de Lorca148631.54
Archivo Municipal de Puerto Real142531.47
Archivo Histórico Provincial de Málaga119451.24
Archivo Histórico Municipal de Baeza64430.67
Archivo Municipal de Vera58070.60
Archivo Histórico Provincial de Córdoba8830.09
Archivo Histórico Municipal de Loja7810.08
AHPC4480.05
Total966412100.00

Group by century and province (absolute frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 26626 47667 37764 112057 392558
Granada 51452 113411 2629 167492
Jaén 109275 3734 113009
Málaga 26420 33732 8026 68178 69061
Córdoba 883 883
Cádiz 70676 2111 72787 133152
Sevilla 168 32877 33045
Huelva 27320 27320
Madrid 36074 39109 75183 143976
Burgos 67453 1340 68793
others 166019 39719 21927 227665 227665
Total (century) 0 0 270685 578204 117523 966412 966412

Group by century and province (relative frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0.00 0.00 2.76 4.93 3.91 11.60 40.62
Granada 0.00 0.00 5.32 11.74 0.27 17.33
Jaén 0.00 0.00 0.00 11.31 0.39 11.69
Málaga 0.00 0.00 2.73 3.49 0.83 7.05 7.15
Córdoba 0.00 0.00 0.00 0.00 0.09 0.09
Cádiz 0.00 0.00 0.00 7.31 0.22 7.53 13.78
Sevilla 0.00 0.00 0.02 3.40 0.00 3.42
Huelva 0.00 0.00 0.00 2.83 0.00 2.83
Madrid 0.00 0.00 0.00 3.73 4.05 7.78 14.90
Burgos 0.00 0.00 0.00 6.98 0.14 7.12
others 0.00 0.00 17.18 4.11 2.27 23.56 23.56
Total (century) 0.00 0.00 28.01 59.83 12.16 100.00 100.00

Measures of lexical diversity

MeasureDescriptionFormulaResult
TTR type-token ratio TTR = V N 0.027
RTTR Giraud's root type-token ratio RTTR = V N 24.893
CTTR Carroll's corrected type-token ratio CTTR = V 2N 17.602
C Herdan's C index C = log V log N 0.735
S Somer's S index S = log ( log 𝑉 ) log ( log 𝑁 ) 0.882
M Maas' index M = ( log 𝑁 - log 𝑉 ) log 𝑁 2 0.036
H Honoré's index H = 100 * ( log ⁡N 1 - V 1 V ) 2568.247
K Yule's K index K = 10 4 * [ - 1 N + i = 1 V f v ( i , N ) * ( i N ) 2 ] 172.347
D Simpson's D index D = i = 1 V f v ( i , N ) * ( i N ) * ( i - 1 N - 1 ) 0.017
HTR Hapax-token ratio HTR = V 1 V 0.468
DTR Dis-token ratio DTR = V 2 V 0.151
VGR Vocabulary growth rate VGR = V 1 N 0.013

N = number of words; V = number of types; V1 = number of hapax legomenon; V2 = number of dis legomenon; f v ( i , N ) = numbers of types occurring i times in a sample of length N.