Performance Dataset Representations

https://github.com/INGEOTEC/text_models/actions/workflows/test.yaml/badge.svg https://badge.fury.io/py/text-models.svg https://coveralls.io/repos/github/INGEOTEC/text_models/badge.svg?branch=develop https://dev.azure.com/conda-forge/feedstock-builds/_apis/build/status/text_models-feedstock?branchName=main https://img.shields.io/conda/vn/conda-forge/text_models.svg https://img.shields.io/conda/pn/conda-forge/text_models.svg Documentation Status

This section presents the performance of the Dataset Text Representation; however, there are more representations in the library than the ones presented here. This difference is because the test set for some datasets is not available.

The performance is computed using the available test set, whereas the training set was used to develop the model. The performance measures used are score f1, recall, and precision. The characteristic is that the performance values are presented for each class. In the binary case, the values correspond to the positive class. In the multiclass case, each label was treated as the positive class, and the remaining ones were treated as the negatives; therefore, there is a set of measurements per class.

The procedure used to compute the performance is the following. The first step is to import the necessary libraries

>>> from EvoMSA.evodag import BoW
>>> from EvoMSA.utils import load_dataset
>>> from microtc.utils import tweet_iterator
>>> from sklearn.metrics import f1_score, recall_score, precision_score
>>> import numpy as np

These steps are followed by instantiating the text representation; it is used as an example the Delitos dataset.

>>> bow = BoW(lang='es')
>>> bow.estimator_instance = load_dataset(lang='es', name='delitos_ingeotec', k=1)

Assuming the dataset is in a JSON format and that it contains a keyword text; the predictions are computed as follows.

>>> Dtest = list(tweet_iterator('delitos_test.json'))
>>> hy = bow.predict(Dtest)

The next step is to compute the performance, the first step is to retrieved the ground truth, i.e.,

>>> y = np.array([x['label'] for x in Dtest])

Then, the performance, e.g., recall is computed as:

>>> recall_score(y, hy)

The performance includes the estimated standard error, this is computed using bootstrap on the test set, asumming that the training and the model are kept constant. The following code was used to estimate it.

>>> B = []
>>> for s in np.random.randint(y.shape[0], size=(500, y.shape[0])):
>>>     _ = recall_score(y[s], hy[s])
>>>     B.append(_)
>>> np.std(B)

Spanish

MeTwo

label

f1

recall

precision

DOUBTFUL

\(0.3854 \pm 0.0456\)

\(0.8605 \pm 0.0553\)

\(0.2483 \pm 0.0361\)

NON_SEXIST

\(0.7827 \pm 0.0152\)

\(0.7372 \pm 0.0193\)

\(0.8342 \pm 0.0198\)

SEXIST

\(0.6501 \pm 0.0231\)

\(0.7409 \pm 0.0292\)

\(0.5791 \pm 0.0261\)

davincis2022_1

label

f1

recall

precision

1

\(0.7235 \pm 0.0208\)

\(0.7247 \pm 0.0260\)

\(0.7224 \pm 0.0253\)

delitos_ingeotec

label

f1

recall

precision

1

\(0.7727 \pm 0.0380\)

\(0.6711 \pm 0.0517\)

\(0.9107 \pm 0.0392\)

detests2022_task1

label

f1

recall

precision

1

\(0.6224 \pm 0.0314\)

\(0.5393 \pm 0.0363\)

\(0.7357 \pm 0.0366\)

exist2021_task1

label

f1

recall

precision

sexist

\(0.7408 \pm 0.0183\)

\(0.7735 \pm 0.0227\)

\(0.7108 \pm 0.0236\)

Overview of the HAHA Task [HAHA]

haha2018

label

f1

recall

precision

1

\(0.7459 \pm 0.0087\)

\(0.7004 \pm 0.0107\)

\(0.7977 \pm 0.0113\)

meoffendes2021_task1

label

f1

recall

precision

NO

\(0.9144 \pm 0.0038\)

\(0.8815 \pm 0.0059\)

\(0.9500 \pm 0.0041\)

NOM

\(0.5000 \pm 0.0228\)

\(0.8066 \pm 0.0263\)

\(0.3623 \pm 0.0214\)

OFG

\(0.0838 \pm 0.0159\)

\(0.6757 \pm 0.0822\)

\(0.0446 \pm 0.0088\)

OFP

\(0.5327 \pm 0.0172\)

\(0.7990 \pm 0.0207\)

\(0.3995 \pm 0.0170\)

meoffendes2021_task3

label

f1

recall

precision

1

\(0.5566 \pm 0.0259\)

\(0.4866 \pm 0.0286\)

\(0.6502 \pm 0.0322\)

Overview of MEX-A3T at IberEval 2018 [MEX-A3T]

mexa3t2018_aggress

label

f1

recall

precision

1

\(0.6866 \pm 0.0133\)

\(0.6455 \pm 0.0166\)

\(0.7333 \pm 0.0156\)

Overview of the Task on Automatic Misogyny Identification at IberEval 2018 [AMI]

misoginia

label

f1

recall

precision

1

\(0.7741 \pm 0.0177\)

\(0.7859 \pm 0.0227\)

\(0.7626 \pm 0.0238\)

misogyny_centrogeo

label

f1

recall

precision

1

\(0.8882 \pm 0.0088\)

\(0.8925 \pm 0.0110\)

\(0.8840 \pm 0.0114\)

SemEval-2018 Task 1: Affect in tweets [Task-1]

semeval2018_anger

label

f1

recall

precision

0

\(0.5646 \pm 0.0288\)

\(0.6243 \pm 0.0348\)

\(0.5153 \pm 0.0328\)

1

\(0.4453 \pm 0.0278\)

\(0.6073 \pm 0.0361\)

\(0.3515 \pm 0.0269\)

2

\(0.4131 \pm 0.0292\)

\(0.7163 \pm 0.0388\)

\(0.2902 \pm 0.0254\)

3

\(0.4023 \pm 0.0339\)

\(0.6509 \pm 0.0444\)

\(0.2911 \pm 0.0302\)

semeval2018_fear

label

f1

recall

precision

0

\(0.6876 \pm 0.0240\)

\(0.7225 \pm 0.0293\)

\(0.6560 \pm 0.0297\)

1

\(0.4364 \pm 0.0314\)

\(0.5934 \pm 0.0405\)

\(0.3450 \pm 0.0298\)

2

\(0.4141 \pm 0.0315\)

\(0.6560 \pm 0.0453\)

\(0.3026 \pm 0.0278\)

3

\(0.4600 \pm 0.0350\)

\(0.8214 \pm 0.0424\)

\(0.3194 \pm 0.0307\)

semeval2018_joy

label

f1

recall

precision

0

\(0.6922 \pm 0.0208\)

\(0.7986 \pm 0.0239\)

\(0.6108 \pm 0.0265\)

1

\(0.4170 \pm 0.0251\)

\(0.5765 \pm 0.0351\)

\(0.3266 \pm 0.0235\)

2

\(0.4795 \pm 0.0289\)

\(0.7115 \pm 0.0368\)

\(0.3616 \pm 0.0281\)

3

\(0.3853 \pm 0.0351\)

\(0.6632 \pm 0.0498\)

\(0.2716 \pm 0.0301\)

semeval2018_sadness

label

f1

recall

precision

0

\(0.6331 \pm 0.0269\)

\(0.7104 \pm 0.0315\)

\(0.5709 \pm 0.0319\)

1

\(0.4510 \pm 0.0277\)

\(0.5693 \pm 0.0341\)

\(0.3734 \pm 0.0283\)

2

\(0.3946 \pm 0.0294\)

\(0.6541 \pm 0.0421\)

\(0.2825 \pm 0.0256\)

3

\(0.4563 \pm 0.0403\)

\(0.7059 \pm 0.0521\)

\(0.3371 \pm 0.0368\)

semeval2018_valence

label

f1

recall

precision

-3

\(0.3755 \pm 0.0404\)

\(0.6500 \pm 0.0547\)

\(0.2640 \pm 0.0343\)

-2

\(0.3610 \pm 0.0301\)

\(0.6847 \pm 0.0458\)

\(0.2452 \pm 0.0242\)

-1

\(0.4222 \pm 0.0292\)

\(0.6597 \pm 0.0387\)

\(0.3105 \pm 0.0266\)

0

\(0.3463 \pm 0.0285\)

\(0.5594 \pm 0.0403\)

\(0.2508 \pm 0.0247\)

1

\(0.2609 \pm 0.0321\)

\(0.6176 \pm 0.0582\)

\(0.1654 \pm 0.0233\)

2

\(0.2435 \pm 0.0327\)

\(0.6471 \pm 0.0667\)

\(0.1500 \pm 0.0230\)

3

\(0.4095 \pm 0.0451\)

\(0.8431 \pm 0.0510\)

\(0.2704 \pm 0.0369\)

Overview of TASS 2017 [TASS2017-2016]

tass2016

label

f1

recall

precision

N

\(0.6250 \pm 0.0028\)

\(0.8317 \pm 0.0029\)

\(0.5006 \pm 0.0031\)

NEU

\(0.0743 \pm 0.0022\)

\(0.7946 \pm 0.0112\)

\(0.0390 \pm 0.0012\)

NONE

\(0.5923 \pm 0.0028\)

\(0.5876 \pm 0.0034\)

\(0.5971 \pm 0.0033\)

P

\(0.6952 \pm 0.0024\)

\(0.7496 \pm 0.0030\)

\(0.6482 \pm 0.0030\)

tass2017

label

f1

recall

precision

N

\(0.6460 \pm 0.0264\)

\(0.6667 \pm 0.0324\)

\(0.6266 \pm 0.0317\)

NEU

\(0.2555 \pm 0.0324\)

\(0.5942 \pm 0.0585\)

\(0.1627 \pm 0.0235\)

NONE

\(0.2960 \pm 0.0340\)

\(0.6613 \pm 0.0581\)

\(0.1907 \pm 0.0253\)

P

\(0.5691 \pm 0.0316\)

\(0.6859 \pm 0.0389\)

\(0.4864 \pm 0.0337\)

Overview of TASS 2018: Opinions, health and emotions [TASS2018]

tass2018_s1_l1

label

f1

recall

precision

UNSAFE

\(0.8013 \pm 0.0173\)

\(0.8322 \pm 0.0206\)

\(0.7726 \pm 0.0228\)

tass2018_s1_l2

label

f1

recall

precision

UNSAFE

\(0.8390 \pm 0.0031\)

\(0.8329 \pm 0.0039\)

\(0.8453 \pm 0.0040\)

tass2018_s2

label

f1

recall

precision

UNSAFE

\(0.7776 \pm 0.0189\)

\(0.8845 \pm 0.0198\)

\(0.6937 \pm 0.0254\)

English

SCv1

label

f1

recall

precision

1

\(0.6086 \pm 0.0175\)

\(0.6148 \pm 0.0207\)

\(0.6025 \pm 0.0205\)

SCv2-GEN

label

f1

recall

precision

1

\(0.6881 \pm 0.0105\)

\(0.6681 \pm 0.0125\)

\(0.7093 \pm 0.0133\)

SS-Twitter

label

f1

recall

precision

1

\(0.7824 \pm 0.0122\)

\(0.8230 \pm 0.0146\)

\(0.7455 \pm 0.0157\)

SS-Youtube

label

f1

recall

precision

1

\(0.8782 \pm 0.0088\)

\(0.9219 \pm 0.0096\)

\(0.8385 \pm 0.0126\)

business

label

f1

recall

precision

david_leonhardt

\(0.8000 \pm 0.0841\)

\(0.8000 \pm 0.1070\)

\(0.8000 \pm 0.1053\)

david_segal

\(0.4262 \pm 0.0830\)

\(0.8667 \pm 0.0887\)

\(0.2826 \pm 0.0701\)

david_streitfeld

\(0.7895 \pm 0.0765\)

\(1.0000 \pm 0.0000\)

\(0.6522 \pm 0.1029\)

james_glanz

\(0.8387 \pm 0.0794\)

\(0.8667 \pm 0.0948\)

\(0.8125 \pm 0.1029\)

javier_c_hernandez

\(0.8750 \pm 0.0657\)

\(0.9333 \pm 0.0708\)

\(0.8235 \pm 0.0937\)

louise_story

\(0.8485 \pm 0.0751\)

\(0.9333 \pm 0.0628\)

\(0.7778 \pm 0.1050\)

ccat

label

f1

recall

precision

AlanCrosby

\(1.0000 \pm 0.0000\)

\(1.0000 \pm 0.0000\)

\(1.0000 \pm 0.0000\)

AlexanderSmith

\(0.8197 \pm 0.0393\)

\(1.0000 \pm 0.0000\)

\(0.6944 \pm 0.0561\)

BenjaminKangLim

\(0.5119 \pm 0.0441\)

\(0.8600 \pm 0.0466\)

\(0.3644 \pm 0.0418\)

DavidLawder

\(0.6250 \pm 0.0530\)

\(0.7000 \pm 0.0651\)

\(0.5645 \pm 0.0633\)

JaneMacartney

\(0.5786 \pm 0.0471\)

\(0.9200 \pm 0.0387\)

\(0.4220 \pm 0.0475\)

JimGilchrist

\(0.9800 \pm 0.0146\)

\(0.9800 \pm 0.0219\)

\(0.9800 \pm 0.0195\)

MarcelMichelson

\(0.9375 \pm 0.0260\)

\(0.9000 \pm 0.0436\)

\(0.9783 \pm 0.0217\)

MureDickie

\(0.5217 \pm 0.0449\)

\(0.9600 \pm 0.0291\)

\(0.3582 \pm 0.0415\)

RobinSidel

\(0.8909 \pm 0.0329\)

\(0.9800 \pm 0.0201\)

\(0.8167 \pm 0.0508\)

ToddNissen

\(0.5938 \pm 0.0514\)

\(0.7600 \pm 0.0589\)

\(0.4872 \pm 0.0557\)

cricket

label

f1

recall

precision

PeterRoebuck

\(0.7895 \pm 0.0808\)

\(1.0000 \pm 0.0000\)

\(0.6522 \pm 0.1074\)

SambitBal

\(0.8387 \pm 0.0787\)

\(0.8667 \pm 0.0902\)

\(0.8125 \pm 0.1047\)

dileep_premachandran

\(0.8966 \pm 0.0622\)

\(0.8667 \pm 0.0836\)

\(0.9286 \pm 0.0739\)

ian_chappel

\(0.9375 \pm 0.0470\)

\(1.0000 \pm 0.0000\)

\(0.8824 \pm 0.0804\)

news20c

label

f1

recall

precision

alt.atheism

\(0.5464 \pm 0.0207\)

\(0.8025 \pm 0.0235\)

\(0.4142 \pm 0.0205\)

comp.graphics

\(0.4499 \pm 0.0157\)

\(0.8946 \pm 0.0160\)

\(0.3005 \pm 0.0134\)

comp.os.ms-windows.misc

\(0.5441 \pm 0.0166\)

\(0.8223 \pm 0.0187\)

\(0.4065 \pm 0.0167\)

comp.sys.ibm.pc.hardware

\(0.4506 \pm 0.0164\)

\(0.8776 \pm 0.0174\)

\(0.3031 \pm 0.0140\)

comp.sys.mac.hardware

\(0.5231 \pm 0.0168\)

\(0.9247 \pm 0.0135\)

\(0.3648 \pm 0.0157\)

comp.windows.x

\(0.6461 \pm 0.0167\)

\(0.9266 \pm 0.0136\)

\(0.4959 \pm 0.0187\)

misc.forsale

\(0.6237 \pm 0.0158\)

\(0.9564 \pm 0.0099\)

\(0.4628 \pm 0.0169\)

rec.autos

\(0.5905 \pm 0.0166\)

\(0.9066 \pm 0.0139\)

\(0.4378 \pm 0.0172\)

rec.motorcycles

\(0.7206 \pm 0.0164\)

\(0.9070 \pm 0.0147\)

\(0.5977 \pm 0.0200\)

rec.sport.baseball

\(0.6600 \pm 0.0157\)

\(0.9093 \pm 0.0144\)

\(0.5179 \pm 0.0180\)

rec.sport.hockey

\(0.7894 \pm 0.0149\)

\(0.9298 \pm 0.0125\)

\(0.6858 \pm 0.0208\)

sci.crypt

\(0.8543 \pm 0.0135\)

\(0.8737 \pm 0.0155\)

\(0.8357 \pm 0.0190\)

sci.electronics

\(0.4357 \pm 0.0165\)

\(0.8015 \pm 0.0204\)

\(0.2991 \pm 0.0142\)

sci.med

\(0.6932 \pm 0.0183\)

\(0.8131 \pm 0.0207\)

\(0.6041 \pm 0.0221\)

sci.space

\(0.7950 \pm 0.0152\)

\(0.8909 \pm 0.0160\)

\(0.7178 \pm 0.0204\)

soc.religion.christian

\(0.6757 \pm 0.0163\)

\(0.9347 \pm 0.0123\)

\(0.5292 \pm 0.0188\)

talk.politics.guns

\(0.6286 \pm 0.0175\)

\(0.8929 \pm 0.0160\)

\(0.4851 \pm 0.0193\)

talk.politics.mideast

\(0.8916 \pm 0.0116\)

\(0.8856 \pm 0.0163\)

\(0.8976 \pm 0.0151\)

talk.politics.misc

\(0.4055 \pm 0.0202\)

\(0.7097 \pm 0.0259\)

\(0.2839 \pm 0.0173\)

talk.religion.misc

\(0.3058 \pm 0.0166\)

\(0.7729 \pm 0.0272\)

\(0.1906 \pm 0.0122\)

news4c

label

f1

recall

precision

comp

\(0.9596 \pm 0.0032\)

\(0.9652 \pm 0.0042\)

\(0.9540 \pm 0.0045\)

politics

\(0.8709 \pm 0.0082\)

\(0.9029 \pm 0.0094\)

\(0.8412 \pm 0.0111\)

rec

\(0.9392 \pm 0.0044\)

\(0.9572 \pm 0.0054\)

\(0.9219 \pm 0.0065\)

religion

\(0.8638 \pm 0.0084\)

\(0.9205 \pm 0.0086\)

\(0.8137 \pm 0.0122\)

nfl

label

f1

recall

precision

joe_lapointe

\(0.8485 \pm 0.0700\)

\(0.9333 \pm 0.0632\)

\(0.7778 \pm 0.0994\)

judy_battista

\(0.8750 \pm 0.0630\)

\(0.9333 \pm 0.0650\)

\(0.8235 \pm 0.0899\)

pete_thamel

\(0.6957 \pm 0.1148\)

\(0.5333 \pm 0.1308\)

\(1.0000 \pm 0.0000\)

offenseval2019_A

label

f1

recall

precision

OFF

\(0.5829 \pm 0.0293\)

\(0.4833 \pm 0.0324\)

\(0.7342 \pm 0.0352\)

offenseval2019_B

label

f1

recall

precision

UNT

\(0.2857 \pm 0.0991\)

\(0.1852 \pm 0.0729\)

\(0.6250 \pm 0.1958\)

offenseval2019_C

label

f1

recall

precision

GRP

\(0.6556 \pm 0.0391\)

\(0.7564 \pm 0.0484\)

\(0.5784 \pm 0.0466\)

IND

\(0.6872 \pm 0.0400\)

\(0.6700 \pm 0.0490\)

\(0.7053 \pm 0.0466\)

OTH

\(0.3497 \pm 0.0506\)

\(0.7143 \pm 0.0821\)

\(0.2315 \pm 0.0395\)

poetry

label

f1

recall

precision

abbey

\(0.4545 \pm 0.1374\)

\(0.5000 \pm 0.1701\)

\(0.4167 \pm 0.1496\)

benet

\(0.7143 \pm 0.1030\)

\(1.0000 \pm 0.0000\)

\(0.5556 \pm 0.1183\)

eliot

\(0.6897 \pm 0.1011\)

\(1.0000 \pm 0.0000\)

\(0.5263 \pm 0.1149\)

hardy

\(0.6429 \pm 0.1113\)

\(0.9000 \pm 0.0947\)

\(0.5000 \pm 0.1212\)

wilde

\(0.3125 \pm 0.1058\)

\(0.5000 \pm 0.1690\)

\(0.2273 \pm 0.0901\)

wordsworth

\(0.4706 \pm 0.1500\)

\(0.8000 \pm 0.2056\)

\(0.3333 \pm 0.1352\)

r10

label

f1

recall

precision

acq

\(0.9744 \pm 0.0043\)

\(0.9856 \pm 0.0047\)

\(0.9635 \pm 0.0071\)

coffee

\(0.9778 \pm 0.0253\)

\(1.0000 \pm 0.0000\)

\(0.9565 \pm 0.0469\)

crude

\(0.8958 \pm 0.0205\)

\(0.9587 \pm 0.0182\)

\(0.8406 \pm 0.0318\)

earn

\(0.9875 \pm 0.0023\)

\(0.9871 \pm 0.0034\)

\(0.9880 \pm 0.0032\)

interest

\(0.7560 \pm 0.0343\)

\(0.9753 \pm 0.0183\)

\(0.6172 \pm 0.0436\)

money-fx

\(0.6537 \pm 0.0355\)

\(0.9655 \pm 0.0209\)

\(0.4941 \pm 0.0393\)

money-supply

\(0.4779 \pm 0.0609\)

\(0.9643 \pm 0.0346\)

\(0.3176 \pm 0.0526\)

ship

\(0.6195 \pm 0.0529\)

\(0.9722 \pm 0.0277\)

\(0.4545 \pm 0.0564\)

sugar

\(0.9412 \pm 0.0337\)

\(0.9600 \pm 0.0397\)

\(0.9231 \pm 0.0499\)

trade

\(0.7150 \pm 0.0358\)

\(0.9867 \pm 0.0143\)

\(0.5606 \pm 0.0431\)

r52

label

f1

recall

precision

acq

\(0.9539 \pm 0.0058\)

\(0.9813 \pm 0.0054\)

\(0.9280 \pm 0.0096\)

alum

\(0.6531 \pm 0.0791\)

\(0.8421 \pm 0.0816\)

\(0.5333 \pm 0.0894\)

bop

\(0.2857 \pm 0.0752\)

\(1.0000 \pm 0.0000\)

\(0.1667 \pm 0.0510\)

carcass

\(0.0199 \pm 0.0083\)

\(1.0000 \pm 0.0772\)

\(0.0101 \pm 0.0042\)

cocoa

\(0.9032 \pm 0.0619\)

\(0.9333 \pm 0.0676\)

\(0.8750 \pm 0.0874\)

coffee

\(0.9362 \pm 0.0365\)

\(1.0000 \pm 0.0000\)

\(0.8800 \pm 0.0631\)

copper

\(0.8966 \pm 0.0608\)

\(1.0000 \pm 0.0000\)

\(0.8125 \pm 0.0961\)

cotton

\(0.8182 \pm 0.0938\)

\(1.0000 \pm 0.0000\)

\(0.6923 \pm 0.1294\)

cpi

\(0.5263 \pm 0.0805\)

\(0.8824 \pm 0.0815\)

\(0.3750 \pm 0.0766\)

cpu

\(0.0204 \pm 0.0207\)

\(1.0000 \pm 0.4828\)

\(0.0103 \pm 0.0107\)

crude

\(0.8227 \pm 0.0259\)

\(0.9587 \pm 0.0188\)

\(0.7205 \pm 0.0368\)

dlr

\(0.2105 \pm 0.1336\)

\(0.6667 \pm 0.3406\)

\(0.1250 \pm 0.0916\)

earn

\(0.9862 \pm 0.0025\)

\(0.9880 \pm 0.0032\)

\(0.9844 \pm 0.0039\)

fuel

\(0.1818 \pm 0.0871\)

\(0.4286 \pm 0.2060\)

\(0.1154 \pm 0.0610\)

gas

\(0.0185 \pm 0.0084\)

\(0.6250 \pm 0.1903\)

\(0.0094 \pm 0.0043\)

gnp

\(0.3000 \pm 0.0589\)

\(1.0000 \pm 0.0000\)

\(0.1765 \pm 0.0409\)

gold

\(0.8163 \pm 0.0588\)

\(1.0000 \pm 0.0000\)

\(0.6897 \pm 0.0823\)

grain

\(0.0615 \pm 0.0193\)

\(1.0000 \pm 0.0000\)

\(0.0317 \pm 0.0103\)

heat

\(0.0845 \pm 0.0437\)

\(0.7500 \pm 0.2465\)

\(0.0448 \pm 0.0243\)

housing

\(0.2353 \pm 0.1316\)

\(1.0000 \pm 0.3026\)

\(0.1333 \pm 0.0874\)

income

\(0.1860 \pm 0.0813\)

\(1.0000 \pm 0.1089\)

\(0.1026 \pm 0.0500\)

instal-debt

\(0.0513 \pm 0.0489\)

\(1.0000 \pm 0.4844\)

\(0.0263 \pm 0.0263\)

interest

\(0.7817 \pm 0.0333\)

\(0.9506 \pm 0.0256\)

\(0.6638 \pm 0.0447\)

ipi

\(0.4074 \pm 0.0854\)

\(1.0000 \pm 0.0000\)

\(0.2558 \pm 0.0669\)

iron-steel

\(0.1727 \pm 0.0427\)

\(1.0000 \pm 0.0000\)

\(0.0945 \pm 0.0256\)

jet

\(0.0000 \pm 0.0000\)

\(0.0000 \pm 0.0000\)

\(0.0000 \pm 0.0000\)

jobs

\(0.7742 \pm 0.0906\)

\(1.0000 \pm 0.0000\)

\(0.6316 \pm 0.1145\)

lead

\(0.0485 \pm 0.0227\)

\(1.0000 \pm 0.1467\)

\(0.0248 \pm 0.0120\)

lei

\(0.2609 \pm 0.1250\)

\(1.0000 \pm 0.2551\)

\(0.1500 \pm 0.0828\)

livestock

\(0.0353 \pm 0.0152\)

\(1.0000 \pm 0.0631\)

\(0.0180 \pm 0.0079\)

lumber

\(0.0100 \pm 0.0050\)

\(1.0000 \pm 0.1530\)

\(0.0050 \pm 0.0025\)

meal-feed

\(0.0015 \pm 0.0015\)

\(1.0000 \pm 0.4800\)

\(0.0008 \pm 0.0007\)

money-fx

\(0.6667 \pm 0.0345\)

\(0.9540 \pm 0.0226\)

\(0.5123 \pm 0.0393\)

money-supply

\(0.4091 \pm 0.0531\)

\(0.9643 \pm 0.0350\)

\(0.2596 \pm 0.0421\)

nat-gas

\(0.2353 \pm 0.0558\)

\(1.0000 \pm 0.0000\)

\(0.1333 \pm 0.0361\)

nickel

\(0.0021 \pm 0.0022\)

\(1.0000 \pm 0.4782\)

\(0.0011 \pm 0.0011\)

orange

\(0.4737 \pm 0.1066\)

\(1.0000 \pm 0.0000\)

\(0.3103 \pm 0.0909\)

pet-chem

\(0.0227 \pm 0.0089\)

\(1.0000 \pm 0.0631\)

\(0.0115 \pm 0.0046\)

platinum

\(0.0024 \pm 0.0016\)

\(1.0000 \pm 0.2918\)

\(0.0012 \pm 0.0008\)

potato

\(0.0328 \pm 0.0180\)

\(1.0000 \pm 0.1812\)

\(0.0167 \pm 0.0093\)

reserves

\(0.2526 \pm 0.0593\)

\(1.0000 \pm 0.0000\)

\(0.1446 \pm 0.0387\)

retail

\(0.0870 \pm 0.0819\)

\(1.0000 \pm 0.4859\)

\(0.0455 \pm 0.0466\)

rubber

\(0.5161 \pm 0.1092\)

\(0.8889 \pm 0.1134\)

\(0.3636 \pm 0.1024\)

ship

\(0.5528 \pm 0.0557\)

\(0.9444 \pm 0.0369\)

\(0.3908 \pm 0.0542\)

strategic-metal

\(0.0214 \pm 0.0090\)

\(0.8333 \pm 0.1807\)

\(0.0108 \pm 0.0046\)

sugar

\(0.8846 \pm 0.0502\)

\(0.9200 \pm 0.0541\)

\(0.8519 \pm 0.0700\)

tea

\(0.0072 \pm 0.0041\)

\(1.0000 \pm 0.2551\)

\(0.0036 \pm 0.0021\)

tin

\(0.0706 \pm 0.0215\)

\(0.9000 \pm 0.1107\)

\(0.0367 \pm 0.0116\)

trade

\(0.6577 \pm 0.0356\)

\(0.9733 \pm 0.0180\)

\(0.4966 \pm 0.0396\)

veg-oil

\(0.2136 \pm 0.0546\)

\(1.0000 \pm 0.0000\)

\(0.1196 \pm 0.0341\)

wpi

\(0.6207 \pm 0.1090\)

\(1.0000 \pm 0.0000\)

\(0.4500 \pm 0.1116\)

zinc

\(0.0249 \pm 0.0114\)

\(1.0000 \pm 0.1089\)

\(0.0126 \pm 0.0059\)

r8

label

f1

recall

precision

acq

\(0.9752 \pm 0.0040\)

\(0.9871 \pm 0.0040\)

\(0.9635 \pm 0.0070\)

crude

\(0.8712 \pm 0.0229\)

\(0.9504 \pm 0.0197\)

\(0.8042 \pm 0.0345\)

earn

\(0.9875 \pm 0.0024\)

\(0.9871 \pm 0.0033\)

\(0.9880 \pm 0.0033\)

grain

\(0.1513 \pm 0.0444\)

\(0.9000 \pm 0.0982\)

\(0.0826 \pm 0.0264\)

interest

\(0.8000 \pm 0.0301\)

\(0.9877 \pm 0.0119\)

\(0.6723 \pm 0.0419\)

money-fx

\(0.7414 \pm 0.0330\)

\(0.9885 \pm 0.0114\)

\(0.5931 \pm 0.0413\)

ship

\(0.4242 \pm 0.0471\)

\(0.9722 \pm 0.0294\)

\(0.2713 \pm 0.0382\)

trade

\(0.7813 \pm 0.0326\)

\(1.0000 \pm 0.0000\)

\(0.6410 \pm 0.0438\)

semeval2017

label

f1

recall

precision

negative

\(0.6153 \pm 0.0054\)

\(0.8200 \pm 0.0062\)

\(0.4924 \pm 0.0058\)

neutral

\(0.6034 \pm 0.0053\)

\(0.6069 \pm 0.0065\)

\(0.5999 \pm 0.0061\)

positive

\(0.5592 \pm 0.0079\)

\(0.6884 \pm 0.0094\)

\(0.4708 \pm 0.0088\)

semeval2018_anger

label

f1

recall

precision

0

\(0.6560 \pm 0.0182\)

\(0.6624 \pm 0.0219\)

\(0.6498 \pm 0.0223\)

1

\(0.2529 \pm 0.0229\)

\(0.5135 \pm 0.0410\)

\(0.1678 \pm 0.0172\)

2

\(0.3647 \pm 0.0242\)

\(0.5185 \pm 0.0332\)

\(0.2812 \pm 0.0219\)

3

\(0.4584 \pm 0.0289\)

\(0.6986 \pm 0.0381\)

\(0.3411 \pm 0.0266\)

semeval2018_fear

label

f1

recall

precision

0

\(0.7536 \pm 0.0132\)

\(0.7393 \pm 0.0172\)

\(0.7685 \pm 0.0170\)

1

\(0.2122 \pm 0.0243\)

\(0.4758 \pm 0.0452\)

\(0.1366 \pm 0.0174\)

2

\(0.2900 \pm 0.0248\)

\(0.4873 \pm 0.0402\)

\(0.2064 \pm 0.0198\)

3

\(0.3297 \pm 0.0375\)

\(0.6479 \pm 0.0578\)

\(0.2212 \pm 0.0296\)

semeval2018_joy

label

f1

recall

precision

0

\(0.4585 \pm 0.0276\)

\(0.6546 \pm 0.0350\)

\(0.3528 \pm 0.0267\)

1

\(0.4160 \pm 0.0228\)

\(0.4985 \pm 0.0273\)

\(0.3570 \pm 0.0231\)

2

\(0.4642 \pm 0.0213\)

\(0.6222 \pm 0.0276\)

\(0.3702 \pm 0.0208\)

3

\(0.4684 \pm 0.0236\)

\(0.7661 \pm 0.0288\)

\(0.3374 \pm 0.0218\)

semeval2018_sadness

label

f1

recall

precision

0

\(0.6737 \pm 0.0188\)

\(0.7236 \pm 0.0236\)

\(0.6302 \pm 0.0223\)

1

\(0.3173 \pm 0.0244\)

\(0.5285 \pm 0.0382\)

\(0.2267 \pm 0.0198\)

2

\(0.3944 \pm 0.0230\)

\(0.5569 \pm 0.0315\)

\(0.3054 \pm 0.0211\)

3

\(0.4141 \pm 0.0283\)

\(0.6822 \pm 0.0431\)

\(0.2973 \pm 0.0245\)

semeval2018_valence

label

f1

recall

precision

-3

\(0.3280 \pm 0.0314\)

\(0.6667 \pm 0.0496\)

\(0.2175 \pm 0.0244\)

-2

\(0.4034 \pm 0.0248\)

\(0.7066 \pm 0.0341\)

\(0.2823 \pm 0.0214\)

-1

\(0.1600 \pm 0.0207\)

\(0.5250 \pm 0.0542\)

\(0.0944 \pm 0.0133\)

0

\(0.4591 \pm 0.0231\)

\(0.5992 \pm 0.0307\)

\(0.3720 \pm 0.0227\)

1

\(0.2389 \pm 0.0256\)

\(0.5794 \pm 0.0483\)

\(0.1505 \pm 0.0181\)

2

\(0.2667 \pm 0.0271\)

\(0.6154 \pm 0.0503\)

\(0.1702 \pm 0.0199\)

3

\(0.5037 \pm 0.0311\)

\(0.7445 \pm 0.0375\)

\(0.3806 \pm 0.0305\)

travel

label

f1

recall

precision

jeff_bailey

\(0.8966 \pm 0.0605\)

\(0.8667 \pm 0.0889\)

\(0.9286 \pm 0.0696\)

matthew_wald

\(0.9091 \pm 0.0561\)

\(1.0000 \pm 0.0000\)

\(0.8333 \pm 0.0913\)

micheline_maynard

\(0.5714 \pm 0.0846\)

\(0.8000 \pm 0.1038\)

\(0.4444 \pm 0.0866\)

michelle_higgins

\(0.8333 \pm 0.0679\)

\(1.0000 \pm 0.0000\)

\(0.7143 \pm 0.0982\)

Arabic

semeval2017

label

f1

recall

precision

negative

\(0.5977 \pm 0.0076\)

\(0.7570 \pm 0.0090\)

\(0.4938 \pm 0.0085\)

neutral

\(0.4803 \pm 0.0092\)

\(0.4670 \pm 0.0103\)

\(0.4944 \pm 0.0106\)

positive

\(0.4505 \pm 0.0101\)

\(0.5594 \pm 0.0129\)

\(0.3771 \pm 0.0105\)

semeval2017_taskBD

label

f1

recall

precision

positive

\(0.7391 \pm 0.0087\)

\(0.7322 \pm 0.0113\)

\(0.7461 \pm 0.0109\)

semeval2018_anger

label

f1

recall

precision

0

\(0.4475 \pm 0.0402\)

\(0.6622 \pm 0.0534\)

\(0.3379 \pm 0.0380\)

1

\(0.4437 \pm 0.0353\)

\(0.5462 \pm 0.0453\)

\(0.3736 \pm 0.0351\)

2

\(0.2179 \pm 0.0336\)

\(0.4667 \pm 0.0648\)

\(0.1421 \pm 0.0240\)

3

\(0.5741 \pm 0.0338\)

\(0.7583 \pm 0.0395\)

\(0.4619 \pm 0.0354\)

semeval2018_fear

label

f1

recall

precision

0

\(0.5447 \pm 0.0380\)

\(0.6837 \pm 0.0469\)

\(0.4527 \pm 0.0399\)

1

\(0.4030 \pm 0.0407\)

\(0.6000 \pm 0.0564\)

\(0.3034 \pm 0.0358\)

2

\(0.4953 \pm 0.0351\)

\(0.5852 \pm 0.0429\)

\(0.4293 \pm 0.0371\)

3

\(0.3368 \pm 0.0463\)

\(0.6531 \pm 0.0708\)

\(0.2270 \pm 0.0373\)

semeval2018_joy

label

f1

recall

precision

0

\(0.4615 \pm 0.0376\)

\(0.7600 \pm 0.0479\)

\(0.3314 \pm 0.0341\)

1

\(0.4255 \pm 0.0332\)

\(0.5385 \pm 0.0412\)

\(0.3518 \pm 0.0334\)

2

\(0.5860 \pm 0.0277\)

\(0.6429 \pm 0.0336\)

\(0.5385 \pm 0.0327\)

3

\(0.3711 \pm 0.0424\)

\(0.7660 \pm 0.0644\)

\(0.2449 \pm 0.0339\)

semeval2018_sadness

label

f1

recall

precision

0

\(0.6099 \pm 0.0337\)

\(0.6719 \pm 0.0406\)

\(0.5584 \pm 0.0398\)

1

\(0.2414 \pm 0.0359\)

\(0.4912 \pm 0.0633\)

\(0.1600 \pm 0.0269\)

2

\(0.3478 \pm 0.0359\)

\(0.5333 \pm 0.0500\)

\(0.2581 \pm 0.0312\)

3

\(0.5714 \pm 0.0386\)

\(0.6737 \pm 0.0478\)

\(0.4961 \pm 0.0431\)

semeval2018_valence

label

f1

recall

precision

-3

\(0.2932 \pm 0.0293\)

\(0.7000 \pm 0.0520\)

\(0.1854 \pm 0.0214\)

-2

\(0.4885 \pm 0.0268\)

\(0.6648 \pm 0.0340\)

\(0.3861 \pm 0.0270\)

-1

\(0.2362 \pm 0.0273\)

\(0.6716 \pm 0.0570\)

\(0.1433 \pm 0.0187\)

0

\(0.2681 \pm 0.0259\)

\(0.5676 \pm 0.0453\)

\(0.1755 \pm 0.0194\)

1

\(0.2607 \pm 0.0279\)

\(0.5556 \pm 0.0493\)

\(0.1703 \pm 0.0208\)

2

\(0.4578 \pm 0.0316\)

\(0.7000 \pm 0.0395\)

\(0.3401 \pm 0.0294\)

3

\(0.4318 \pm 0.0391\)

\(0.7403 \pm 0.0515\)

\(0.3048 \pm 0.0349\)

Chinese

NLPCC2013_emotion

label

f1

recall

precision

Anger

\(0.3404 \pm 0.0212\)

\(0.6716 \pm 0.0323\)

\(0.2280 \pm 0.0170\)

Disgust

\(0.4972 \pm 0.0176\)

\(0.7356 \pm 0.0217\)

\(0.3755 \pm 0.0171\)

Fear

\(0.1219 \pm 0.0192\)

\(0.8043 \pm 0.0608\)

\(0.0660 \pm 0.0111\)

Happiness

\(0.5850 \pm 0.0176\)

\(0.7348 \pm 0.0208\)

\(0.4859 \pm 0.0191\)

Like

\(0.5991 \pm 0.0161\)

\(0.7289 \pm 0.0190\)

\(0.5086 \pm 0.0182\)

Sadness

\(0.5292 \pm 0.0192\)

\(0.7674 \pm 0.0233\)

\(0.4038 \pm 0.0194\)

Surprise

\(0.1735 \pm 0.0193\)

\(0.6782 \pm 0.0539\)

\(0.0995 \pm 0.0120\)

NLPCC2013_opinion

label

f1

recall

precision

Y

\(0.8968 \pm 0.0114\)

\(0.9288 \pm 0.0140\)

\(0.8670 \pm 0.0168\)

NLPCC2013_polarity

label

f1

recall

precision

NEG

\(0.7850 \pm 0.0242\)

\(0.8025 \pm 0.0309\)

\(0.7683 \pm 0.0320\)

NEU

\(0.0420 \pm 0.0239\)

\(0.5000 \pm 0.2212\)

\(0.0219 \pm 0.0128\)

OTHER

\(0.0787 \pm 0.0326\)

\(0.5556 \pm 0.1606\)

\(0.0424 \pm 0.0184\)

POS

\(0.7628 \pm 0.0262\)

\(0.7605 \pm 0.0331\)

\(0.7651 \pm 0.0329\)

online_shopping_polarity

label

f1

recall

precision

POS

\(0.9222 \pm 0.0025\)

\(0.9200 \pm 0.0035\)

\(0.9245 \pm 0.0031\)

simplifyweibo_4_moods

label

f1

recall

precision

Anger

\(0.4039 \pm 0.0033\)

\(0.6628 \pm 0.0048\)

\(0.2905 \pm 0.0029\)

Happiness

\(0.7612 \pm 0.0018\)

\(0.7150 \pm 0.0022\)

\(0.8138 \pm 0.0022\)

Sadness

\(0.4311 \pm 0.0034\)

\(0.6696 \pm 0.0047\)

\(0.3178 \pm 0.0031\)

waimai_polarity

label

f1

recall

precision

POS

\(0.9030 \pm 0.0054\)

\(0.9070 \pm 0.0073\)

\(0.8991 \pm 0.0078\)

weibo_senti_100k_polarity

label

f1

recall

precision

POS

\(0.9056 \pm 0.0019\)

\(0.9289 \pm 0.0023\)

\(0.8834 \pm 0.0028\)

References

Human-Annotated

Igor Mozetic, Miha Grcar, and Jasmina Smailovic. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5):1–26, May 2016.

Task-4

S. Rosenthal, N. Farra, and P. Nakov, SemEval-2017 Task 4: Sentiment analysis in Twitter, in Proc. of the 11th International Workshop on Semantic Evaluation. ACL, Aug. 2017, pp. 502–518.

Task-1

S. M. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, SemEval-2018 Task 1: Affect in tweets, in Proc. of the 12th International Workshop on Semantic Evaluation. ACL, June 2018, pp. 1–17.

TASS2017-2016

Eugenio Martınez-Camara, Manuel C Dıaz-Galiano, Angel Garcıa-Cumbreras, Manuel Garcıa-Vega, and Julio Villena-Roman. Overview of TASS 2017. CEUR Workshop Proceedings, 1896:13–21, Sept. 2017.

TASS2018

Eugenio Martınez-Camara, Yudivin Almeida-Cruz, Manuel Carlos Dıaz-Galiano, Suilan Estevez-Velarde, Migue A Garcıa-Cumbreras, Manuel Garcıa-Vega, Yoan Gutierrez, Arturo Montejo-Raez, Andrs Montoyo, Rafael Munoz, Alejandro Piad-Morffis, and Julio Villena-Roman. Overview of TASS 2018: Opinions, health and emotions. CEUR Workshop Proceedings, 2172:13–27, Sept. 2018.

MEX-A3T

Miguel A Alvarez-Carmona, Estefana Guzman-Falcon, Manuel Montes-Y-Gomez, Hugo Jair Escalante, Luis Villasenor Pineda, Vernica Reyes-Meza, and Antonio Rico-Sulayes. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets. CEUR Workshop Proceedings, 2150:74–96, Sept. 2018

HAHA

Santiago Castro, Luis Chiruzzo, and Aiala Rosa. Overview of the HAHA Task: Humor analysis based on human annotation at IberEval 2018. CEUR Workshop Proceedings, 2150:187–194, Sept. 2018.

AMI

E. Fersini1, P. Rosso2, and M. Anzovino1. Overview of the Task on Automatic Misogyny Identification at IberEval 2018. CEUR Workshop Proceedings, 2150:214-228, Sept. 2018.

SS

Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. Sentiment strength detection forthe social web. Journal of the American Society for Information Science and Technology, 63(1):163–173, Jan. 2012.

SCv1

Marilyn Walker, Jean Fox Tree, Pranav Anand, Rob Abbott, and Joseph King. A corpus forresearch on deliberation and debate. InProceedings of the Eighth International Conferenceon Language Resources and Evaluation (LREC’12), pages 812–817, Istanbul, Turkey, May2012. European Language Resources Association (ELRA).

SCv2-GEN

Shereen Oraby, Vrindavan Harrison, Lena Reed, Ernesto Hernandez, Ellen Riloff, and Marilyn Walker. Creating and characterizing a diverse corpus of sarcasm in dialogue. InProceedingsof the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages31–41, Los Angeles, September 2016. Association for Computational Linguistics