Pages

Open Source super-linear growth

Once a territory of a select few developers, researchers and engineers open source is now embraced and supported by government and large software corporations.

Today when I was writing my technical report I came across the article “The Growth of Open Source Software in Organizations”. It is an excellent resource for getting the state-of-the-art information for the open source growth in industry and its implications. The following point in the executive summary reflects (based on the survey of about 512 companies) the super linear growth of open source software development:

Organizations are saving millions of dollars on IT by using open source software. In 2004, open source software saved large companies (with annual revenue of over $1 billion) an average of $3.3 million. Medium-sized companies (between $50 million and $1 billion in annual revenue) saved an average $1.1 million. Firms with revenues under $50 million saved an average $520,000. Asked to categorize all the benefits (cost savings and other) from open source, most companies said they were moderate or major. Some 70% of large firms are seeing moderate or major benefits from open source. Of the companies under $1 billion in revenue, 59% are seeing major benefits.

Google Experimental search

Try out the new features Google is planning to implement to better the search experience at Google experimental. Google experimental is the new application launched by the Google labs where they make available the features they are planning to implement for the search engine. The strategy is to get a feedback from the user about the proposed feature and implement it in case there is a consistent positive feedback. This technique works well both for Google and their users. From the users perspective they get the features they like and for Google it makes them more sellable. It greatly reduces the probability of a product failure and potential customer loss. The strategy also reduces efforts spend on developing and testing features which will never be implemented. There are also some pitfalls. From the Google’s perspective, they are exposing the future application in advance to their competitors. In addition, it may also potentially reduce the user appreciation for new features because the features are already 'old' for them.

Summarizing, I think that the new 'preview' strategy for the new products can serve as a win-win deal for both Google and the users.

Geometric growth of the Linux kernel

Yesterday I read the article “Growth, Evolution, and Structural Change in Open Source Software” by Michael Godfrey and Qiang Tu. The article analyses the growth of Linux kernel in terms of lines of code since the first release. The statistical model developed for the uncommented lines of code is shown below. Even after crossing two million lines of code the Linux kernel enjoys a geometric growth in terms of lines of code. The model for the commented lines of code shows a similar trend.

Model: y = .21* x2 + 252 *x + 90,055

where,

y = size in uncommented LOC

x = days since vl.0

r2 = .997 (coefficient of determination calculated using least squares)

Linux enjoys the active support form the ever increasing open source developer community which enables it to sustain such a tremendous growth. More than half of the code is for the various drivers which are independent of the system. The authors have also analyzed the Fetchmail, GCC compiler and VIM editor applications and concluded that ‘the evolution of each open source system is different and cannot be generalized’.

The interesting question is: what’s the trend for the total open source development? Is it increasing linearly, geometrically or may be decreasing. Successful projects like MySQL, Apache, Eclipse, SugarCRM, and OpenOffice indicate that Open Source must be increasing at a super linear rate. Still a formal analysis of open source is required to validate our hypothesis.

Obesity Map

The Department of Health and Services recently published a survey showing the exponential growth of obesity((BMI > 30) in United States. The complete survey can be accessed by clicking here. If you observe the map carefully we can see some pattern.
The southern states seem to suffer from obesity more than the northern states. Also the west coast states are much more healthy than the east coast states. Mississippi and West Virginia had a prevalence of obesity equal to or greater than 30%. Today, obesity has become much bigger problem than smoking in late 70's and 80's (thanks to the cheap crap fast food available). The importance of good health and nutrition should be emphasized in the educational curriculum. The youth today should be well educated and know the ill effects of obesity. I think the influence and family and parents will play a vital role.
Educational institutes play the most important part. For example, there are many fast food restaurants on-campus in the university I study in Ohio. In classroom we study that fast food is bad and when we come outside the classroom we can only see fast food all around. Thus, today we just don't need the educational theory but some practical steps to curb the trend. The active role of educational institutes, government and parents will decide the fitness of the youth tomorrow.



Percent of Obese (BMI > 30) in U.S. Adults-2006

India a bigger market than US for Nokia

India has overtaken the US to become the second largest market (in terms of sale) for Nokia after China. Today India has about 185 million mobile users and which are increasing at a unprecedented rate. I remember I had my first mobile phone which was a Nokia 3310 in 2003. I was in my third year of engineering at that time and was among a very few who used a mobile phone. And today in 2007 mobile phone has become an ordinary thing for a college student. In 2004 Nokia had just 400 employees and today its more than 9000. Just 3 years ... and such a big difference !!!

Dreaming of your own house?

If you are dreaming for your own home sweet home you should certainly avoid the following ten cities. Forbes recently published a report which listed the top ten least affordable cities for the real estate prices. California seems to be the most expensive with respect to the salary with Los Angeles (thanks for the Hollywood), San Francisco (thanks for the Silicon Valley) and San Diego topping the list. It is a bit surprising to see San Diego, CA ahead of New York which is considered the economic capital of the United States. Also the absence of Chicago, IL and Phoenix, AZ is also somewhat interesting.

  1. Los Angeles, CA
  2. San Francisco, CA
  3. San Diego, CA
  4. New York, NY
  5. Miami, FL
  6. Sacramento, CA
  7. Las Vegas, NV
  8. Seattle, WA
  9. Boston, MA
  10. Orlando, FL

Population Clock

Check out this webpage where US census Bureau estimates the world population real-time. Just wait for some time and refresh the page. You will see the increase in population in that few seconds you took to refresh the page.
The number I got on 08/25/07 at 01:50 GMT.

TCS joins the Open Source Initiative

I will be completing my internship with Open Source Research Group at SAP Labs, Palo Alto next month. In the past five months I learned a lot about the Open Source and its strong presence in Europe and America. The sad thing was that even after having a great infrastructure and a huge computer industry in India, I couldn't find a single firm/project contributing to the Open Source development. But things are changing; the young Indian developers and large corporations are now finding value in the Open Source initiative. Today I came across the Open Source project ""WANEM - The Wide Area Network Emulator" supported by the Indian IT giant Tata Consultancy Services. The project is hosted on SourceForge and licensed under the GNU GPL.

I strongly believe that the Indian companies will benefit the most with Open Source especially considering that most of the companies are service providers and not product companies. The future of the Indian IT sector will be much more secure and independent of outsourcing with the embrace of Open Source development. Especially with the available infrastructure and a large pool of extremely talented developers, Open Source can transform the current business model which is heavily dependent upon outsourcing.


I hope other Indian firms follow the example set by TCS.

Tata eyes global market

India Inc. seems to be shining in not just IT outsourcing and BPO's but also in technological fields like automotive industry. When I was a kid (which was not long back) I can hardly remember a Indian car which had a global presence or for that matter even a national presence. And today I read news that Tata eyes to buy the Land Rover and Jaguar form the parent company Ford. Tata has already launched the first indigenous Indian car in the European market and if it succeeds to acquire the current deal the Indian brand will truly go global. It will also help Tata to reduce their dependence on Indian markets which fetches them more than 90% of their revenue. Let us hope that Tata Motors succeeds in acquiring the deal.

fi yuo cna raed tihs, yuo hvae a sgtrane mnid too

i cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh? yaeh and I awlyas tghuhot slpeling was ipmorantt!

KDD 2007 videos

The Knowledge Discovery and Data Mining (KDD) conference 2007 was held at the Fairmont, San Jose. All the videos for the invited talks, panel discussion and industry & research track presentations are now available online here.

Chak De India: Inspired by a true story

By now many of you must have already seen the latest film Chak De India from Yash Raj Productions. But do you know that the film is inspired by a true story ?

Chak De India is inspired by a real life story of an Indian hockey player Meer Ranjan Negi. Negi was charged with match fixing allegation in 1982 when India lost to Pakistan in the Asian Games at Delhi. Negi then took up the job for coaching the Indian Women Hockey team. Under the leadership of Negi, the Women’s Hockey team won the Gold medal at Manchester Common Wealth Games in 2002.

Grand Canyon of MARS

Enjoy the animation of the Mariner Valley at Mars. It is the largest and the deepest canyon one can find in the whole solar system.

GOOG-411

Google recently started a local voice search service for getting the required information on telephone. So what’s this stuff? Consider you are in driving in a new city and you want to know the nearest pizza place. What do you do? Just drive nearby and hope to find one, call up a friend who has an internet access or just don’t eat? Well, GOOG-411 addresses such issues. You can call 1-800-GOOG-411 (1-800-466-4411) and get the required information instantly. One can also get connected to the business (in our case the pizza shop) for further details. You can get the details by SMS if you are using a mobile phone. And the best part is that all the services are free except the phone charge you incur (which depends upon your service provider).
The system is still in experimental stage for the fine tuning of the voice recognition system. For further details visit: http://labs.google.com/goog411/index.html

English to Hindi

Just came across the Google’s new tool for translating English to Hindi in Devnagiri script. Please visit http://www.google.com/transliterate/indic/ for details. The tools works really works quite well with Hindi formatting and grammar. This is certainly the best available tool so far for automatic translation to devnagiri script.

Open Source Licenses

Open Source development is crowded with different licenses and their various clauses and conditions for code reuse and distribution. It is an interesting analysis to see how many licenses a single project uses. The natural and most logical answer will be 1, which is correct. I did an analysis of more than 5000 different projects listed in Ossmole. The results are as follows:



About 93% open source projects just use 1 license. The most fascinating point is that there are some projects which use more than 2 licenses. It will be very interesting to study such projects which are using more than 2 licenses. I am just wondering about the requirements or process/business model of such projects which require usage of more than two different licenses. Another interesting point is that there are some projects which do not use any software license (yeah !!!). That means they have no restriction in terms of their usage. A closer analysis of such projects will also be a point of interest.

Independence Day

India celebrates its 60th independence day tomorrow on 15th August. Here are some quick facts I came acroos today about India'a glorious past.

1.India never invaded any country in her last 1000 years of history.
2.Number ‘Zero’ was invented by Aryabhatta, an Indian scientist.
3.The world's first University was established in Takshila in 700BC. More than 10,500 students from all over the world studied more than 60 subjects. The University of Nalanda built in the 4th century BC was one of the greatest achievements of ancient India in the field of education.
4.According to the Forbes magazine, Sanskrit is the most suitable language for computer software.
5.Ayurveda is the earliest school of medicine known to humans.
6.The art of navigation was born in the river Sindh 5000 years ago. The very word "Navigation" is derived from the Sanskrit word NAVGATIH.
7.The value of pi was first calculated by Budhayana, and he explained the concept of what is now known as the Pythagorean Theorem. In 1999 British scholars officially published that Budhayan's works dates to the 6 th Century.
8.Algebra, trigonometry and calculus came from India. Quadratic equations were by Sridharacharya in the 11 th Century.
9.According to the Gemological Institute of America, up until 1896, India was the only source of diamonds to the world.
10.IEEE has proved what has been a century-old suspicion amongst academics that the pioneer of wireless communication was Professor Jagdeesh Bose and not Marconi.
11.The earliest reservoir and dam for irrigation was built in Saurashtra.
12.Chess was invented in India.
13.Sushruta is the father of surgery. 2600 years ago health scientists at that time conducted surgeries like cesareans, cataract, fractures and urinary stones. Usage of anesthesia was well known in ancient India.
14.When many cultures in the world were only nomadic forest dwellers over 5000 years ago, Indians established Harappa culture in Sindhu Valley (Indus Valley Civilization).
15.The decimal system was developed in India in 100 BC.

Let's work together to bring back the glorious past and make world a peaceful place to live.Special thanks to Haresh for sending me the facts.

Rich and Green Pune!

Centre for Development Studies and Activities is an internationally renowned, autonomous research, training and policy making institution. For the past thirty years CDSA is addressing itself to issues concerning management of environment while promoting development as well as reduction of poverty and creation of wealth.
The Times of India Pune and CDSA are teaming up to start a series of articles that will be published every Monday and will be concerned with building awareness amongst the citizens of Pune regarding the various issues related to the proper development of the city. The idea is to build a citizens forum geared towards understanding the city, its people and the problems they face so as to come up with better, more sustainable and productive solutions, we feel this is the need of the day as we are all in it together. Remember every Mondays times.

Aims and Objectives of CDSA
1. To conduct research on development problems and processes, to experiment with planning methods, to develop techniques of evaluation which give accurate feedback on the nature and type of changes taking place in the society.
2. To teach and train participants about the problems and processes of development, to impart skills to them on methods and techniques of planning, implementation, administration and evaluation.
3. To help governments, its agencies and other public bodies by training their personnel, carrying out pilot projects for them, conducting research in areas of public policy and decision-making, and giving them correct and accurate feedback.
4. To carry out research and advocacy in areas of public policy, decision making and participatory governance at various levels.
5. To enter into contracts with individuals, firms, companies, societies, institutions, agencies, government and non government organizations in India and abroad which further the objectives of CDSA.
For more information please visit: www.cdsaindia.org/
For more details contact Siddharth Benninger at siddharth.benninger@gmail.com

Knowledge Discovery and Data Mining -Day 1 & 2

The 13th International Conference on Knowledge Discovery and Data Mining (KDD) 2007 conference started on Saturday, 11 Aug, 2007 at the Fairmont Hotel, San Jose, California. The conference is sponsored by Microsoft adCenter Labs (Platinum support), Google, Yahoo, Oracle (Gold supporters) and KDD organization (Organizational support). The registration stared at 5.00 p.m. with no technical sessions or workshops on the first day. The volunteers had a review session at 3.00 p.m. to 4.00 p.m. (I am working as a student volunteer this year for the conference). The technical workshops and tutorials started on Sunday at 9.00 in morning. The full day workshop consisted of various topics including Data Mining and Audience Intelligence for Advertising, Data Mining in Bioinformatics, KDD Cup, Knowledge Discovery from Sensor Data, Privacy, Security, and Trust in KDD and Web Mining and Social Network Analysis. The half day workshops included Multimedia Data Mining, Mining Multiple Information Sources, Challenge on Time Series Classification, Data Mining Standards, Services and Platforms, and Domain Driven Data Mining.
The tutorials topics included Mining Large Time-evolving Data Using Matrix and Tensor Tools, Statistical Framework for Mining Data Streams, Statistical Modeling of Relational Data, Text Mining and Link Analysis for Web and Semantic Web, Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods, Learning Bayesian Networks and Mining Shape and Time Series Databases with Symbolic Representations.
The morning session consisted of three hours having a twenty minutes coffee break in between the sessions followed by lunch and then the afternoon session with the same format.
In the evening there was an award ceremony for best paper, student travel and Service and Innovation Awards followed by Innovation Award Talk by Usama Fayyad. The conference attracted various researchers around the globe in the field of Data Mining and Knowledge Discovery. I personally met some researchers and graduate students working in my area of interest. The conference provided the state-of-the-art research in area of data mining and a perfect platform for networking and meeting the leading researchers and industry personnel. I look forward to the next three days which should be very exciting and technically challenging.

Open Source Target Customers


Open source is targeted for development of tools and software for the developers. The results below show that 31% of open source projects have a target audience of developers. These projects mainly deal with producing easy to use applications for software development like Eclipse, SVN and CVS. Second in the list is the desktop applications for end-users which has about 26% share. These applications which may serve as a substitute for commercial licenses software’s are targeted at the end user. Examples in this category include Open Office, R, and Weka. Interestingly there are some applications which have audiences in area of religion, legal industry and education. A list of top 20 audience/target industry for open source projects is summarized below.

SourceForge Project Categories


SourceForge classifies projects in seven different categories i.e. Planning, Pre-alpha, Alpha, Alpha, Beta, Production, Mature and Inactive, according to their development status.


Observations
1. There are a lot of single developer projects which do not have any significant activity after the initial project registration. The percentage of projects in planning stage indicates this fact.
2. The highest number of projects are in the beta testing phase. Thus, one may expect a lot of open source projects in the production/stable category in the near future.
3. The lowest number of projects in the mature category indicate that a very few projects ultimately mature.
4. Some projects are in inactive state where the project administrator declares the project is shut down. SourceForge frequently removes these projects from the directory.

Superpower India: Following the sine curve?

In my earlier article "India the Superpower: Extraordinary or usual?" I proposed that the rise of and fall of any empire, nation or culture follows an sinusoidal wave. In this article I have plotted the changes in the modern state of India on the economic prosperity and development stage dimensions. A flourishing economy in the seventeenth century reduced to an under developed and poor country during the colonist period and the free and shining India today saw all the sinusoidal phases plotted on the curve below. By the current trends and patterns it is estimated that India will become a superpower by 2040.

India the Superpower: Extraordinary or usual?

India and China’s GDP is growing at rate of more than 10%. Both nations are estimated to be superpowers in the twenty first century. Is this change to India which was always perceived to be a country of poor’s, underdeveloped an extraordinary or an usual.



Historically in the seventeenth century India was always referred as ‘Sone Ki Chidiya’ (Country of Gold). Things started changing as the western colonist expanded their trade empire from 1800 till 1947. In that era India was much advanced and richer in culture and economy than even Constantinople and the European superpowers. In 1600, when the East India Company was founded, Britain was generating 1.8% of the world's GDP, while India was producing 22.5%. By 1870, at the peak of the Raj, Britain was generating 9.1%, while India had been reduced for the first time to the epitome of a Third World nation, a symbol across the globe of famine, poverty and deprivation ( reference:- Time- Thursday, Aug. 02, 2007).
Today, India has more software engineers than any other country. The manufacturing, retails, software and for that matter each and every segment is growing at an overwhelming rate. Is this change extraordinary or just of the blue? My Answer: Nope. It just follows the law of equilibrium.

It was Indian subcontinent in the seventeenth century followed by the rise of America and Russia after the worlds wars and now it’s again the Asian subcontinent. The world trade has followed a sinusoidal wave. It increases, reaches maximum, break evens, reduces and then increases again. The history has seen this sinusoidal behavior with the rise and fall of great empires, nation and cultures and this trend also continues in today’s’ information technology age. The interesting parameter is ‘Time’ which is on X axis. This has varied and will continue to vary depending upon the initial conditions, inequality, intrinsic and external factors relating to a nation, momentum or rate of change etc… It is estimated that China and India will surpass the USA GDP by 2030 and 2045 respectively.
Thus the rise of India and China is not that big surprise. With more than 2 billion skilled workers among the two nations it is just a matter of time that these two humongous pieces of land will start dominating the world economy.

‘Yahoo’ googled more than Google?

Yes that’s true just try it yourself. Yesterday, I used Google Trends to compare Google and Yahoo searches on Google.com and guess what … I found that ‘Yahoo’ is googled more than the word ‘Google’ itself. Isn’t this interesting considering that both are arch rivals targeting the same web search business?

What this result mean is that people actually go to Google and search for Yahoo services on Google. This is a bit absurd for me. If they want to use Yahoo services why should they go to Google.com and search yahoo? They can easily go directly to Yahoo... right? To add to this why will one ever use Google.com and google ‘Google’ keyword itself? To see the results I performed this search (Google on Google.com) myself. The results were as follows (in descending order): Google Maps, Google News, Google Video, Google groups, and then Google.com itself. Thus googling Google gives Google as fifth result and not the first. Anyway still the question remains unanswered: what is the motivation behind googling Google or Yahoo?

Open Source Projects Development Stages

SourceForge classifies projects in seven different categories i.e. Planning, Pre-alpha, Alpha, Alpha, Beta, Production, Mature and Inactive, according to their development status. The following article shows an analysis in which I have quantified the percentage of projects in each category.



Observations

1. There are a lot of single developer projects which do not have any significant activity after the initial project registration. The high percentage of projects in planning stage indicates this fact.
2. The highest number of projects is in the beta testing phase. Thus one may expect a lot of open source projects in the production/stable category in the near future.
3. The lowest number of projects in the mature category indicate that a very few projects ultimately reach maturation.
4. Some projects are in inactive state where the project administrator declares the project shut down. SourceForge frequently removes these projects from the directory.

# developer vs. stage

In my earlier post I analyzed the number of projects according to their development stage. I took a step further and analyzed the average number of developers in projects according to their development stage. The table below shows the average number of developers over all the projects in each development stage.



It can be seen that the developers go on increasing as the project advances in development from pre-planning to mature stage. There is an initial drop from the planning to pre-planning stage but once the project is started there is a consistent increase in the developers. Thus the community around the project increases with the increase in the development of the project. The lowest number of developers can be seen in the inactive stage. Decreasing number of participants may be one of the reasons for the project to be inactive. Hence, one may conclude that for a successful open source project active participation from the open source community is a must. If there is an active open source community there is open source!!!