Anh pththao cho em hỏi trong bài báo có đoạn này:
Em chưa thực sự hiểu về kỹ thuật DCA , anh có thể giải thích cho em về nó được không ạ . Cám ơn anh!
Node-based. Node-based approaches rely on comparing the
properties of the terms involved, which can be related to the terms
themselves, their ancestors, or their descendants. One concept
commonly used in these approaches is information content (IC),
which gives a measure how specific and informative a term is. The
IC of a termc can be quantified as the negative log likelihood
-log p(c)
where p(c) is the probability of occurrence of cin a specific corpus
(such as the UniProt Knowledgebase), being normally estimated
by its frequency of annotation. Alternatively, the IC can also be
calculated from the number of children a term has in the GO
structure [7], although this approach is less commonly used.
The concept of IC can be applied to the common ancestors two
terms have, to quantify the information they share and thus
measure their semantic similarity. There are two main approaches
for doing this: the most informative common ancestor (MICA
technique), in which only the common ancestor with the highest
IC is considered [8]; and the disjoint common ancestors (DCA
technique), in which all disjoint common ancestors (the common ancestors that do not subsume any other common ancestor) are
considered [9].
Approaches based on IC are less sensitive to the issues of
variable semantic distance and variable node density than edge-based measures [8], because the IC gives a measure of a term’s
specificity that is independent of its depth in the ontology (the IC
of a term is dependent on its children but not on its parents).
However, the IC is biased by current trends in biomedical
research, because terms related to areas of scientific interest are
expected to be more frequently annotated than other terms.
Nevertheless, the use of the IC still makes sense from a
probabilistic point of view: it is more probable (and less
meaningful) that two gene products share a commonly used term
than an uncommonly used term, regardless of whether that term is
common because it is generic or because it is related to a hot
research topic.
Other node-based approaches include looking at the number of
shared annotations, that is, the number of gene products
annotated with both terms [10]; computing the number of shared
ancestors across the GO structure; and using other types of
information such as node depth and node link density (i.e., node degree) [11]