Typology (cluster) analysis (askiaanalyse)

Typology (cluster) analysis (AskiaAnalyse)

When you have a large volume of data, it can be useful to regroup similar interviews into groups (clusters) with similar objects: this is called Cluster Analysis. If the groups are very homogeneous, then the analysis of each group will sufficiently describe the population. There are a great number of variants of the same method called the k-means.

We start by arbitrarily choosing a number k of groups and randomly select n prototype interviews from the population.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
We calculate the barycentre of each of the n groups.
We then repeat step 2 with the n average respondents instead of the prototype interviews. We continue this process until we reach a stable partition.

We look for a partition in two classes:

We randomly select two prototypes:

We associate the points to their closest centres:

We calculate the barycentres:

We obtain a stable partition:

The methods depend on the distance metrics used:

Euclidean for numeric questions and
Chi² for counts.

Euclidean distance is used as a metric and variance is used as a measure of cluster scatter. The number of clusters k is an input parameter: an inappropriate choice of k may yield poor results. To measure the partition’s quality, we calculate each scatter’s inertia. We know, thanks to Huygens theory, that regardless of the partition, the sum of the Intra Group Variance and the Inter Group Variance is constant.

We calculate the following:

info % = Variance between clusters/ (Variance between clusters + variance within cluster)

This obtains the percentage of information remaining in spite of the regrouping. By applying a few typology tests with a different number of groups, we will be able to choose the most judicious partition.

To perform a typology (cluster) analysis:

In the general tab, select typology.
Add a closed or scale response question to the active tab (for example, drag and drop them from the questionnaire list).
If you want to display other closed questions in the analysis, but not include them in the calculations, add them to the inactive tab.
In num of groups, specify the number of groups you want to include in your analysis.
Click results to view the analysis.

One page of results is generated for each group you specified. When you open a page, you will see the counts, percentage, sigma, base and significance within that group.

You can change the specific settings used for the analysis. To do so, click analysis options.... For details, see analysis options.

It is possible to create a variable based on your analysis. To do so, click create variable.... For details, see creating a variable.

Once you have generated your results, you can view a log that allows you to review the quality of the typology. To do so, in the general tab click log.... The log file opens in a dedicated window. From here, you can browse the log or save it to your computer as an HTML file. At the end of the log file, the typology selected as the best run is indicated.

Create your own Knowledge Base