Why Classify ?

During early stages of childhood, parent attempt to improve their child’s classification skill. They begin with only two things for example two cubes. They ask their child whether these cubes are the same or not. Then when he become able to identify things, parents give their child only one cube and ask them to find the similar one between five to six different cubes. Later parents provide their child with many various things that differ in color, shape … etc., but at this time the child need to separate them into different groups based on one category color, size or type -like people or animals-.

Via Google Images

 

But why? What the importance of teaching children how to identify and classify things? Why parents care that much to educate them how to sort thing during their growing up?

It may because sorting is kind of math skill so they need to improve it. Or maybe to teach them how to apply logical thinking to objects, but I think the most important reason is to build their decision-making skill.

This skill is not only important for human but also for developing computer programs and/or increase their performance. If we think how can doctors predict the high-risk patients who need urgent help from other in the hospital emergency room, we can find that they measure around distinct factors including: heart rate, blood pressure, age and so on, and based on their values, the computer is responsible to decide which patients should be classified under the high-risk category.

So, it is a classification task again!

Data classification in computer is all about tagging data to be found quickly and efficiently when need. It seems to be performed like the way human do the categorization task. We provide computers with some examples which we called a “training set” to teach the computer how to classify incoming data in the same manner. But if we put in mind the human need, we could not guaranty that the results of classification task that performed by computer program will match human requirement.

For this reason, some researches appear these day to compare the performance of computer programs that perform the categorization task with what the human do when they classify things. but the problem with human’s way of categorizing though, is that it is non-standardized, so yes, every person would come up with slightly different way, even the same person might come up with slightly different categories for the same set of things if we ask them multiple times to do that.

So, in order to solve this problem, I think the best thing to do is to create a predefined list of items for both computer and human. Then start running a classification experiment to get human’s results. At the same time run another experiment in computer by applying some classification algorithms and record their result as well. Based on the analysis results of both outcomes, we can improve the existing methods of classification data to guaranty that we are achieved the human need.

 

Additional Reads

 

http://www.computerweekly.com/feature/Data-classification-why-it-is-important-and-how-to-do-it

http://searchdatamanagement.techtarget.com/definition/data-classification


2 Responses to “Why Classify ?”

  1. skhalifah says:

    Thanks for your reply ke kang. I completely agree with you, yeah it is not easy at all to build classifier that gives us produce useful results. But I think we can also able to achieve it without labels, by using clustering algorithms?

  2. ke kang says:

    Great reading! Teaching computers to understand our world is a complex problem. Without enough clear and labeled data also make it difficult to build an effective classifier in the real world.