Share to: share facebook share twitter share wa share telegram print page

Structured prediction

Structured prediction or structured output learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than discrete or real values.[1]

Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the predicted value is compared to the ground truth, and this is used to adjust the model parameters. Due to the complexity of the model and the interrelations of predicted variables, the processes of model training and inference are often computationally infeasible, so approximate inference and learning methods are used.

Applications

An example application is the problem of translating a natural language sentence into a syntactic representation such as a parse tree. This can be seen as a structured prediction problem[2] in which the structured output domain is the set of all possible parse trees. Structured prediction is used in a wide variety of domains including bioinformatics, natural language processing (NLP), speech recognition, and computer vision.

Example: sequence tagging

Sequence tagging is a class of problems prevalent in NLP in which input data are often sequential, for instance sentences of text. The sequence tagging problem appears in several guises, such as part-of-speech tagging (POS tagging) and named entity recognition. In POS tagging, for example, each word in a sequence must be 'tagged' with a class label representing the type of word:

This DT
is VBZ
a DT
tagged JJ
sentence. NN

The main challenge of this problem is to resolve ambiguity: in the above example, the words "sentence" and "tagged" in English can also be verbs.

While this problem can be solved by simply performing classification of individual tokens, this approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strong conditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as a hidden Markov model or conditional random field[2] that predicts the entire tag sequence for a sentence (rather than just individual tags) via the Viterbi algorithm.

Techniques

Probabilistic graphical models form a large class of structured prediction models. In particular, Bayesian networks and random fields are popular. Other algorithms and models for structured prediction include inductive logic programming, case-based reasoning, structured SVMs, Markov logic networks, Probabilistic Soft Logic, and constrained conditional models. The main techniques are:

Structured perceptron

One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron by Collins.[3] This algorithm combines the perceptron algorithm for learning linear classifiers with an inference algorithm (classically the Viterbi algorithm when used on sequence data) and can be described abstractly as follows:

  1. First, define a function that maps a training sample and a candidate prediction to a vector of length ( and may have any structure; is problem-dependent, but must be fixed for each model). Let be a function that generates candidate predictions.
  2. Then:
Let be a weight vector of length
For a predetermined number of iterations:
For each sample in the training set with true output :
Make a prediction :
Update (from towards ): , where is the learning rate.

In practice, finding the argmax over is done using an algorithm such as Viterbi or a max-sum, rather than an exhaustive search through an exponentially large set of candidates.

The idea of learning is similar to that for multiclass perceptrons.

References

  1. ^ Gökhan BakIr, Ben Taskar, Thomas Hofmann, Bernhard Schölkopf, Alex Smola and SVN Vishwanathan (2007), Predicting Structured Data, MIT Press.
  2. ^ a b Lafferty, J.; McCallum, A.; Pereira, F. (2001). "Conditional random fields: Probabilistic models for segmenting and labeling sequence data" (PDF). Proc. 18th International Conf. on Machine Learning. pp. 282–289.
  3. ^ Collins, Michael (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms (PDF). Proc. EMNLP. Vol. 10.
Index: pl ar de en es fr it arz nl ja pt ceb sv uk vi war zh ru af ast az bg zh-min-nan bn be ca cs cy da et el eo eu fa gl ko hi hr id he ka la lv lt hu mk ms min no nn ce uz kk ro simple sk sl sr sh fi ta tt th tg azb tr ur zh-yue hy my ace als am an hyw ban bjn map-bms ba be-tarask bcl bpy bar bs br cv nv eml hif fo fy ga gd gu hak ha hsb io ig ilo ia ie os is jv kn ht ku ckb ky mrj lb lij li lmo mai mg ml zh-classical mr xmf mzn cdo mn nap new ne frr oc mhr or as pa pnb ps pms nds crh qu sa sah sco sq scn si sd szl su sw tl shn te bug vec vo wa wuu yi yo diq bat-smg zu lad kbd ang smn ab roa-rup frp arc gn av ay bh bi bo bxr cbk-zam co za dag ary se pdc dv dsb myv ext fur gv gag inh ki glk gan guw xal haw rw kbp pam csb kw km kv koi kg gom ks gcr lo lbe ltg lez nia ln jbo lg mt mi tw mwl mdf mnw nqo fj nah na nds-nl nrm nov om pi pag pap pfl pcd krc kaa ksh rm rue sm sat sc trv stq nso sn cu so srn kab roa-tara tet tpi to chr tum tk tyv udm ug vep fiu-vro vls wo xh zea ty ak bm ch ny ee ff got iu ik kl mad cr pih ami pwn pnt dz rmy rn sg st tn ss ti din chy ts kcg ve 
Prefix: a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 
Kembali kehalaman sebelumnya