A. Blog Content Category Classification Knowing how to classify blog content is important for three reasons:
One way to classify your blog content is to pick keywords or phrases from your article, e.g. Library Science, Blogging, Web Design. The other way is to pick category facets that are mutually orthogonal, i.e. the Colon system. The Colon system is a library classification system developed by Raganathan. This system has five facets:
We can restrict our content categories to five labels per blog post. Since there are too many keywords possible, we cannot possibly write them all as our blog post labels. So I what I propose to do is to use compound labeling system per facet. In this system, if a facet has several possible labelstext, photo, and videothen we arrange these labels in alphabetical order separated by hyphens, e.g. phototextvideo. In this way we remove the permutations of label combinations, so that they won't greatly eat up Google's allowed budget of 2,000 labels per blog. The only drawback in this method is that you need to have a sizable number of blog posts, so that there is a considerable chance of having common compound keywords. In order to determine the number of combinations of compound keywords, we use some theorems in Combinatorics. If $n$ is the number of keywords for a particular facet and $r$ is the number of keywords in a compound, then the total keyword combinations that can be formed is \begin{equation} C(n,r) = \frac{n!}{(nr)!r!}. \end{equation} For example, if there are $n=6$ keywords and you take them $r=4$ at a time, then the total number of keyword combinations is $C(6,4) = 6!/(4!2!)=15$. Now, if we sum up all combinations for $r=0$ to $r=n$, we obtain the binomial theorem: \begin{equation} \sum_{r=0}^{r=n} C(n,r) = 2^n. \end{equation} For example, if there are $n=6$ keywords for a particular facet, then the total number of compound keywords that can be made is $2^6=64$. If there are 5 facets with the same number of keywords each, then the total number of compound keywords for the whole blog is $65\times 5=325$. B. Compound Keyword Category System 1. Personality
For example, if the article is about programming, physics, and literature, we write the label as "literaturephysicsprogramming".
2. Matter or Property
For example, if the article has texts, graphics, and videos, we write the label as "graphicstextvideos".
3. Energy
For example, if the article is about cost of marketing, we can label it as "financialsocial."
4. Space
For example, if the topic is about the interaction of the lithosphere, ionosphere, and magnetosphere, we can write the label as "ionospherelithospheremagnetosphere".
5. Time
For example, if the article is a news with some thoughts on the future, we can write the label as "futurenews".









07 April 2015
Combinatorics of compound keyword system for blog content classification
Assistant Professor, Department of Physics, Ateneo de Manila University. Program Head, Upper Atmosphere Dynamics, Manila Observatory.
Subscribe to:
Posts (Atom)