Corpus info

General stats

Tokens957958
Words858741
Types23000
Lemmas15403
Hapax legomenon10788
Dis legomenon3456
POS tags492

Tokens = strings separated by white spaces (punctuation marks included).
Words = strings separated by white spaces (punctuation marks excluded).
Types = unique words (based on standardized spelling and case insensitive).

Documents

Number of documents715
Average (tokens per document)1340
Median (tokens per document)984
Longest document (tokens)15662
Shortest document (tokens)108
Oldest document (year)1600
Most recent document (year)1896

Group by part of speech

Main POS tagN%
common noun20912121.83
preposition15337816.01
determiner10414610.87
punctuation9921710.36
verb904779.44
numeral701607.32
conjunction695157.26
adjective444854.64
pronoun374743.91
proper noun336083.51
adverb267332.79
untagged192792.01
foreign word2670.03
interjection980.01
Total957958100.00

Group by project

ProjectN%
CORDEREGRA36539838.14
HISPATESD24758025.84
ALEA1817699118.48
CORTENEX16788117.52
_1080.01
Total957958100.00

Group by text type

Text typeN%
inventory of goods68937471.96
witness statement19288420.13
medical certificate712897.44
other44110.46
Total957958100.00

Group by century

CenturyN%
XVIII57698460.23
XVII27008128.19
XIX11089311.58
Total957958100.00

Group by province

ProvinceN%
Granada16749117.48
Almería11200611.69
Badajoz11097111.58
Jaén10649511.12
Madrid751837.85
Cádiz707727.39
Burgos687897.18
Málaga681767.12
Cáceres612476.39
Sevilla330453.45
Huelva283882.96
Murcia148251.55
Valladolid127181.33
La Rioja53450.56
Cantabria49440.52
Toledo39810.42
Palencia37850.40
Zamora21490.22
Navarra20110.21
Álava16800.18
Soria14440.15
León13780.14
Gipuzkoa6040.06
Teruel2930.03
Salamanca2380.02
Total957958100.00

Group by institution

InstitutionN%
Archivo de la Real Chancillería de Granada23739024.78
Archivo Histórico Provincial de Badajoz10798011.27
Archivo Histórico Provincial de Jaén10539511.00
Archivo Histórico de Protocolos de Madrid732907.65
Archivo Histórico Provincial de Burgos666526.96
Archivo Histórico Provincial de Almería637746.66
Archivo Histórico Provincial de Cáceres599016.25
Archivo Histórico de Protocolos de Granada495715.17
Archivo Histórico Provincial de Cádiz495345.17
Archivo de la Real Chancillería de Valladolid445674.65
Archivo Histórico Provincial de Huelva276412.89
Archivo Histórico Provincial de Sevilla247022.58
Archivo Histórico Municipal de Lorca148251.55
Archivo Municipal de Puerto Real142531.49
Archivo Histórico Provincial de Málaga119451.25
Archivo Municipal de Vera57570.60
Archivo Histórico Municipal de Loja7810.08
Total957958100.00

Group by century and province (absolute frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 26576 47666 37764 112006 385992
Granada 50902 113960 2629 167491
Jaén 106495 106495
Málaga 26419 33732 8025 68176 68176
Córdoba 0
Cádiz 70664 108 70772 132205
Sevilla 168 32877 33045
Huelva 28388 28388
Madrid 36074 39109 75183 143972
Burgos 67449 1340 68789
others 166016 39679 21918 227613 227613
Total (century) 0 0 270081 576984 110893 957958 957958

Group by century and province (relative frequencies)

XV XVI XVII XVIII XIX Total (province) Total (area)
Almería 0.00 0.00 2.77 4.98 3.94 11.69 40.29
Granada 0.00 0.00 5.31 11.90 0.27 17.48
Jaén 0.00 0.00 0.00 11.12 0.00 11.12
Málaga 0.00 0.00 2.76 3.52 0.84 7.12 7.12
Córdoba 0.00 0.00 0.00 0.00 0.00 0.00
Cádiz 0.00 0.00 0.00 7.38 0.01 7.39 13.80
Sevilla 0.00 0.00 0.02 3.43 0.00 3.45
Huelva 0.00 0.00 0.00 2.96 0.00 2.96
Madrid 0.00 0.00 0.00 3.77 4.08 7.85 15.03
Burgos 0.00 0.00 0.00 7.04 0.14 7.18
others 0.00 0.00 17.33 4.14 2.29 23.76 23.76
Total (century) 0.00 0.00 28.19 60.23 11.58 100.00 100.00

Measures of lexical diversity

MeasureDescriptionFormulaResult
TTR type-token ratio TTR = V N 0.027
RTTR Giraud's root type-token ratio RTTR = V N 24.820
CTTR Carroll's corrected type-token ratio CTTR = V 2N 17.550
C Herdan's C index C = log V log N 0.735
S Somer's S index S = log ( log 𝑉 ) log ( log 𝑁 ) 0.882
M Maas' index M = ( log 𝑁 - log 𝑉 ) log 𝑁 2 0.036
H Honoré's index H = 100 * ( log ⁡N 1 - V 1 V ) 2573.322
K Yule's K index K = 10 4 * [ - 1 N + i = 1 V f v ( i , N ) * ( i N ) 2 ] 172.143
D Simpson's D index D = i = 1 V f v ( i , N ) * ( i N ) * ( i - 1 N - 1 ) 0.017
HTR Hapax-token ratio HTR = V 1 V 0.469
DTR Dis-token ratio DTR = V 2 V 0.150
VGR Vocabulary growth rate VGR = V 1 N 0.013

N = number of words; V = number of types; V1 = number of hapax legomenon; V2 = number of dis legomenon; f v ( i , N ) = numbers of types occurring i times in a sample of length N.