8/23/2005

Achieve a steady ontology from agreements and light-weithed iteration?

It’s no doubt that the ontology is used mainly for knowledge representation.

In the case of knowledge representation, ideally we should express as much knowledge as we know, so that the agent can find/understand anything it needed while performing its task, which unfortunately, is impossible. The other extreme is to give agent the knowledge that it might need in the designed scenario only. In this case the agent will out of work when meeting with unexpected situation (yes, statistic methods might make things better, but don’t solve the problem).

I think ontology struggle in the middle. That is, agents share some basic cognition about the world (ontology), so that their knowledge (instantial ontology) can be shared base on basic cognition. But we can continue arguing to what an extent the basic cognition (ontology) should be modeled? Of course it is the same thing we face with knowledge presentation, because ontology itself is knowledge too.

So we can see that ontology is a kind of “modeled for the future, presented with current”. It’s difficult, who can tell what will happen in the future or not? The current solution is to collect all the experiences from everyone. We find a group of people, they argue, they fight, they compromise, and finally decide “if we have these concepts and relations, it will cover 80% of tasks in the future”. Ontology is an agreement.

So it may be shabby, because the creator has no patients with so many concepts and relations, or the agent doesn’t need so much knowledge presently, or the “expert” himself in fact knows little about the working domain. (I guess these often occur in the domain ontology construction). Thus we will feel the pressing need for ontology evolvement much more often than the change of domain. Because the need is inspired both by the incompleteness of the ontology and by the real changes of the working domain.

Many work related to this problem have been done, such as, ontology evolvement, ontology integration and ontology mapping. For the widely test and usage of these techniques I tend to argue we need some light-weighted iteration. That is, quick publish, quick use, quick review and enter the next iteration quickly. It is just like the idea behind light-weighted software implementation, or, open source. The light- weight iterations should go on until the ontology is steady. Then the changes will occurs mainly because of the changes of domain itself.

But I have to admit that the light- weighted iteration is not an easy thing. En, it’s what I will do next currently.

p.s. We know the definition of ontology is “an explicit specification of conceptualization ”, one of the feature behind this definition is “ontology is an agreement among a community”, all the discussion above is on this feature only.
p.s. this blog is written in Microsoft word using “blogger for word add-in”. I do expecting google’s own IM now.

1 comment:

Francesco Sclano said...

TermExtractor is online! It's a FREE and high-performing tool for terminology extraction.

TermExtractor, my master thesis, is online at the
address http://lcl2.di.uniroma1.it.

TermExtractor is a FREE and high-performing software
package for Terminology Extraction and a very useful starting-point for Ontology Construction.
The software helps a web community to
extract and validate relevant domain terms in their
interest domain, by submitting an archive of
domain-related documents in any format
(txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml,
html/htm, chm, wpd and also zip archives.)

TermExtractor extracts terminology consensually
referred in a specific application domain. The
software takes as input a corpus of domain documents,
parses the documents, and extracts a list of
"syntactically plausible" terms (e.g. compounds,
adjective-nouns, etc.).
Documents parsing assigns a greater importance
to terms with text layouts (title, abstract, bold, italic,
underlined, etc.). Two entropy-based measures, called
Domain Relevance and Domain Consensus, are then used.
Domain Consensus is used to select only the terms
which are consensually referred throughout the corpus
documents. Domain Relevance to select only the terms
which are relevant to the domain of interest, Domain
Relevance is computed with reference to a set of
contrastive terminologies from different domains.
Finally, extracted terms are further filtered using
Lexical Cohesion, that measures the degree of
association of all the words in a terminological
string.

NEW: Now TermExtractor allows to a group of users to
validate an extracted terminology. See the news at
http://lcl2.di.uniroma1.it/termextractor/news.jsp


--
Francesco Sclano
home page: http://lcl2.di.uniroma1.it/~sclano
msn: francesco_sclano@yahoo.it
skype: francesco978