Help for Choice Components Analysis

  1. The purpose of Choice Component Analysis
  2. The proximal sources of decisions
  3. The choice models
  4. Completing the form
  5. Types of analysis
  6. Selecting the model base
  7. Selecting input data
  8. Using labels
  9. Applying sample weights
  10. Comparative selection data types
  11. Types of probability scale
  12. Values used to infer selection
  13. Treatment of tied selections
  14. Assigning missing values
  15. Setting blockage codes
  16. Analysis output
  17. Statistical significance of option coefficients


1. The purpose of Choice Component Analysis

Choice Components Analysis is a program for analyzing the immediate causes of selections/choices/decisions across sets of options. The analysis unit may be single cases or a sample of cases. The analysis reveals the contributions of proximal causes to each option's selection or non-selection. The contributions are represented by the marginal changes to probabilities of selection for each case and the averages of these marginal changes across any sample.

Suitable data for analysis includes selections, selection expectations, ratings, or rankings for option choice or desirability across a set of options, together with information on any factors that block option selections - such as unawareness, unaffordability, irrelevance, and so on. Ideally such information is available for every option for every case. This can be done economically by applying structured collection methods so that questioning is filtered to avoid redundancy. Choice Components Analysis provides an integrated approach to analysis of such data.

For further background on Choice Components Analysis theory together with an example analysis you can download the pdf file: "An approach to the ecological analysis of decisions"


2. The proximal sources of decisions

For each case, Choice Components Analysis analyzes every option (whether finally selected or not), in terms of two classes of proximal selection determinants:

  1. Sufficient identification.
  2. Comparative selection.

Either or both may make a contribution to the selection status of an option for a case.

This two-part classification accounts for choices in a manner analogous to dual process theories of decision-making, such as Daniel Kahneman's System1/System2 descriptions. However, in particular, it should be noted that sufficient identification adds a negative blocking component to the proximal causes of non-selection.

For each of these two proximal determinants, the analysis package results show the factor contributions to individual selections and aggregate shares in terms of an adjustment from prior expectations, based on indifference or an informed set of priors. Sufficient identification and comparative selection, separately or together, account for the proximal causes of both selection and non-selection of each option, case-by-case, and, via aggregation, for samples.

The defining characteristic of a sufficient identification as a determinant of choice is that it applies to an option without reference to any other option. Examples of sufficient identifications that may rule an option unselectable include: non-recognition or non-recall, ignorance or unfamiliarity, unavailability, inaccessibility, or ineligibility, or, exclusion, elimination, indecision, deferral or inertia. Such option identifications block comparative selections amongst options to which they apply. The contribution of each type of blockage to the outcomes across the options for each case can be separately identified and analyzed.

In addition to blocking  an option, sufficient identification may also rule an option as automatically selected. For example, automatic selection of an option can result from behavioural scripts, habits, impulses, compulsions, reflexes, and so on. Automatically, such selections mean that all other options are ignored i.e. blocked. Likewise, when all but one option is blocked, the remaining option is sufficiently identified for automatic selection as no other option is available.

All sufficient identifications determining selection or non-selection of an option are accounted for, directly or indirectly, by blockage.

Sufficient identifications may apply to one, some, all, or no options for any case. Where one or more options are ruled out by blockage, the probabilities of selection amongst the alternatives are increased by the corresponding amount spread across these selectable options.

Comparative selections, on the other hand, are based, explicitly, or implicitly (by way of comparison with a standard), on relations between options. Here, options are selected or rejected (or tied for selection) on account of their relative attraction. Comparative selections apply only between unblocked options.

Throughout the remaining text and in the program documentation the effects, both positive and negative, of sufficient identification, are referred to as "blockage".

Some example data and Choice Components Analysis output are included in the downloaded file ExampleAnalyses.xls


3. The choice models

Two choice models are provided:
  1. Indifference model
  2. Informed priors model

Each model provides a different base perspective or orientation from which to view the results of analysis. The views from each perspective can be very different. Both are valid views but their use depends on the objectives of the research. The indifference model makes no assumptions about the selection likelihoods amongst any set of options. Informed priors, on the other hand, may start with respondents' last known decision (e.g. from an earlier poll) as a basis for evaluating change and current choices.

The indifference model provides each option with same chance of selection. The initial probability for each option is equal to one divided by the number of options. The effects of the causal factors in decision-making are measured as deviations from this base. The calculations for each option for each case are made using the following formula. The left hand term shows that the formula is concerned with a total accounting for both selections and non-selections across all the options for each case. The four terms on the right-hand side account for indifference, blockage, comparative selections, and residual error, respectively.

Indifference formula

In the equation, O refers to an option and S refers to a selectable (unblocked) option. The sum of O is simply a count of the options in the set while the sum of S is the count of unblocked options. E refers to the respondent's expectation of an option's selection - either directly obtained or inferred from their comparative selection data . When an actual selection is substituted for a choice expectation in the third term, the analysis becomes compositional only and the error term disappears.

So long as one of the options is selected, the formula sums to unity over all options for a case. Cases choosing none of the options available are automatically accounted for by the program and excluded from the analysis. Since a respondent must display some behaviour in the context of any set of options, then, the only situation where all options can be blocked for a case is when there is no residual option available - and the respondent is unable or unwilling to make a selection amongst those that are. An example would be where there is no awareness of any of the specified options.

The second term in the equation, dealing with blockage, is expanded in the analysis to account for each type of blockage contributing to a respondent's decision. Where a blockage is present the probability of selection for the option drops to zero while the probabilities of non-blocked options increase. For example, where there are three options and one is blocked, the chances for the blocked option drops from one in three to zero, while, for the remaining two the selection chances increase from one in three to one in two. All such increases and decreases are attributable to the specific types of blockage identified in the data collection.

The second model, the informed priors model, is quite general. (The indifference model is a special case with prior probabilities the same for every option.) In the informed priors model the base probabilities may differ between options but will still sum to unity for each case. For example, the choices in a previous poll for the same cases can provide the base from which current behaviour deviates as a measure of change. In that case each prior selection may be assigned a value of one and the remaining options zero. The idea of the priors being informed is that the results are interpretable in relation to the base. Here is the formula for the informed priors model showing that the total selection probability for all the options with respect to a case, (being equal to unity), is the sum of the prior probability plus deviations accounting for blockage, comparative selections, and any error.

Priors formula

The structure of the formula is the same as that for indifference but with R referring to the prior probability for the option. The left hand term shows that the formula is concerned with a total accounting for both selections and non-selections across all the options for each case. As with the indifference model, the four terms on the right-hand side account for indifference, blockage, comparative selections, and residual error, respectively. So also the second term in the equation, dealing with blockage, is expanded in the analysis to account for each type of blockage contributing to a respondent's decision. When an actual selection is substituted for a choice expectation in the formula, E(O), analysis becomes compositional only and the error term disappears. As for the priors, in practice, it is expected that priors will commonly consist of 1 for selected and 0 for unselected. (To ensure computational tractability for every case, zeros are coded by the program as 0.0000001 in accord with Cromwell's Rule.)


4. Completing the form

Access the form

To initiate the program and access the form click the Tools menu in Excel and then the item "Choice Components Analysis". Here is the form.

Choice Components Analysis data entry form

Moving around the form

If you use your keyboard to move around the form:

  1. Use the TAB key to proceed from item to item.
  2. Use SHIFT-TAB to reverse.
  3. Use the RETURN key to execute the program.

Clearing the form

Analysis specifications are retained between runs to enable adjustments and modifications.

To clear the form and return options to defaults click the "Clear and reset" option button.

Selecting ranges of input data

Input data must be available from an already open Excel workbook.

Any set of input data should consist of cases in rows and variables in columns.

Any column labels should be in the first row of selections.

Only visible (i.e. non-hidden) columns and rows are read as input.

Hide any columns you do not want to select amongst input or you wish to exclude from a particular analysis.

You can also hide or filter cases to perform analyses for different sub-groups.

To hide or filter columns or rows use the regular Excel features before program execution.

To clear the program form so you can see the data to be selected, click the box at the right-hand side of the textbox, then click it again or press return after you have selected the data.

To select ranges with large files a simple method is to click the first cell, press F8, then scroll or page to the last cell then click it.

See also How to select input data


5. Types of analysis

The two types of analysis available are:

  1. combined comparative selection and blockage analysis, and
  2. blockage analysis alone.

If you only have blockage data click the second optionbutton.

In blockage only analysis all valid cases are included in the analysis - not just those with blockage data. The contents of cells containing other than blockage-related data have no effect on the results but the presence of the cells are noted and count in the analysis.


6. Selecting the model base

Click either

  1. the indifference model optionbutton, or
  2. the informed priors optionbutton.

For the indifference model the base is calculated according to the number of options to be analyzed.

For the informed priors model you will need to provide priors data for each option for each case.


7. Selecting input data

Five types of data range can be input

  1. Analysis Data (Required)
  2. Other Data (Optional)
  3. Weight Data (Optional)
  4. Priors Data (Optional)
  5. Tie Weights Data (Optional)

Only Analysis Data is essential to perform an analysis.

For Analysis Data, Other Data Weight Data and Priors Data input, each data range selected must have the same number of cases (plus an extra top row for labels if labels are included). The Tie Weights Data is just a single row with data for each of the options and no labels.

While all types of data may referenced amongst the input data for a particular analysis run, only the data relevant to the specific current analysis will be tested for validity or completeness or adjusted.

Analysis Data input

This is the primary data for Choice Components Analysis.

Any single analysis can be performed on 2 to 30 options.

The number of cases is limited only by the number of rows in the spreadsheet.

The options form the columns and cases the rows. (See file ReformattingData.xls for methods to reformat data, if necessary.)

All the selected options should be in adjoining visible columns. (Use Excel procedures to hide any blank columns or columns for any options you wish to exclude from the current analysis.)

For analyses of comparative selections, in addition to any labels, all cells in the Analysis Data range must include one of the following types of data:

  1. A comparative selection which is always numeric.
  2. A blockage code which will be treated as text.
  3. A missing value code which will be treated as text.

For more on comparative selection data see here .

The codes for blockage and missing data should be unique. This data can be in alpha, numeric or alphanumeric form.

For a blockage only analysis, in addition to any labels, only blockage-related data and missing values are required. Other cells may be empty or contain any other data but the nature of these contents will be ignored.

See also blockage codes and missing values .

Other Data input

Other Data includes any data for the cases you also wish to be associated with an analysis. The data is not used in the computation of results.

This data will be copied to the analysis output. Only visible columns will be selected. Hide any columns you wish to exclude.

Altogether, up to 220 columns of Analysis Data plus Other Data may be included in an analysis run.

This Other Data can be used in a variety of ways including filtering of cases for sub-group analysis or more generally for post-analysis of Choice Components Analysis results. (This is a simple way of keeping case data consistent after deletion of missing values and invalid data cases.)

Weight Data input

Weight Data always consists of a single column of numbers to be applied so that the analysis results are representative of the population from which the data has been drawn.

Weight Data is only evaluated and applied when the "Apply sample weights" checkbox is checked (even though it may be included in the data output).

Whenever weights are applied they are normalised to an average weight of one for the cases currently under analysis.

Priors Data Input

Priors Data has the same number of rows and columns as the Analysis Data - equal to the cases and options. They can also share the same labels if labels are included.

Each row consists of probabilities or percentages which sum to unity or 100%. The data is tested to ensure this criterion is met to within plus or minus 1%.

Priors Data is only evaluated and applied in an analysis when an informed priors analysis is requested (even though it may be included in the data output).

Tie Weights Data input

Tie weights are an alternative to sharing the option probabilities equally amongst tied selections or treating them as missing.

They do not apply to comparative selections based on probabilities nor in blockage only analyses.

Tie Weights Data always consists of a single row of percentages or proportions, one for each option, that sum to 100 or 1. You can enter the data either way. The data is tested to ensure validity. Do not include labels.

Tie Weights Data is only evaluated and applied when the "Treat as weighted selections" optionbutton is selected from amongst the "Treatment of ties" options.

See also treatment of tied selections.


8. Using labels

When the "Labels in first row" checkbox is checked, all input ranges will be assumed to have labels in the first row (except for Tie Weights Data).

Labels for Analysis Data are expected to be complete and unique within its set of labels.

Labels for Other Data and Weight Data do not have to be complete or unique - but the first row will be assumed to include them if the labels checkbox is checked.

Labels for Priors Data may be the same as for Analysis Data. Priors Data labels are not tested for completeness or uniqueness but the first row will be assumed to include them if the labels checkbox is checked.

When the "Labels in first row" checkbox is left unchecked a set of generic labels will be automatically supplied for the input ranges.

If the labels checkbox is unchecked and there are labels in the first row, an error message will be generated.


9. Applying sample weights

To apply weight data (using the data entered as a weight range), check the "Apply sample weights" checkbox.

Whenever weights are applied they are normalised to an average weight of one for the cases in the analysis. (Adjusted weights will be shown in the relevant output tables.)

Whenever weights are applied any cases with weights missing are eliminated from the analysis.

See also selecting weight data


10. Comparative selection data types

Comparative selection data provides, directly or indirectly, the individual's selection expectation.

All comparative selection data input must be numeric.

Click on the relevant optionbutton for the type of input data.

The comparative selection data may show comparative preference explicitly (e.g. rank, probability or stated preference or non-preference) or, implicitly (e.g. a rating) where the preference can be inferred.

Boolean or dichotomous data, ratings and rankings

Note that partial rankings, ratings, etc. (where data is not explicitly collected on every option) will work quite satisfactorily for analytical purposes so long as all options implicitly evaluated are assigned a numeric value that will imply non-selection.

Boolean or dichotomous data, ratings and rankings may all include ties - all cases where options have more then one "most preferred" value are treated as ties. A selection of procedures is available for handling these cases. See treatment of tied selections.

For these data types it is also necessary to specify whether the largest or smallest values provide the criterion for selection. For example, for ranking from first to last (1 to n) the smallest value is selected. See values used to infer selection.

Probability data

Probability data has the concept of ties "built-in" to the comparative selection process. Selection probabilities may be viewed as weighted ties so that any non-zero probability implies proportionate option selection. Indeed, all probability selections are proportional to the probabilities.

Zero to one and zero to one hundred scales are available for data processing. See types of probability scale.

Probability data is tested for validity. Each probability value must lie within the scale range i.e. 0-1 or 0-100, as applicable. And the sum of the probabilities for each case must add to the maximum of the range within plus or minus one percent i.e. .99 to 1.01 or 99 to 101, as applicable.

When data falls outside this range you will be given the option of proceeding, but if you do cases with invalid data will be removed and counted as missing.


11. Types of probability scale

Two probability scales are available for input of probability data. These are the standard 0-1 scale, and, a 0-100 scale usually presented in terms of percentage probabilities.

As probability data is tested for validity in terms of these scales, any probability data input which differs from them will need to be appropriately transformed.

Click on the relevant optionbutton to determine the scale for probability data input.

See also probabililty data


12. Values used to infer selection

Setting the nature of the criterion value for selection is necessary for boolean or dichotomous data and for ratings and rankings.

Depending on how the data was collected, the largest or smallest value may indicate the criterion for selection of an option. For example, for rankings in terms of first, second, etc. (1 to n), the lowest value will be the criterion for selection. For rating scales, commonly the largest value is the basis for selection.

Click on the relevant optionbutton to determine whether the largest or smallest comparative value is to be the selection criterion.

See also Comparative selection data types


13. Treatment of tied selections

Tied selections are likely to arise in the course of comparative selection when applying boolean, dichotomous, rating and rank data types. Ties can be treated as equally shared selections, weighted selections, or, as missing.

Click on the relevant ties treatment optionbutton to choose the method for treatment of ties in the current analysis.

Treating ties as equal selections

For equally shared selections the expected selection probability of one is divided equally amongst the tied selections.

The assumption here (as for weighted tied selections) is that a choice would actually be from amongst the ties i.e. selection would not be blocked by indecision.

Treating ties as weighted selections

Consideration of weighted ties relates solely to option expectation probabilities and their adjustment.

If a substantively significant proportion of boolean, dichotomous, rated or ranked data cases have tied selections, and sharing these tied selections equally between options is viewed as unrealistic, weighting such ties will be preferred. Researchers will differ on the weighting scheme to be applied.

One possible approach is as follows using a sub-sample of the data, specifically, those who have made a decisive comparative selection. For example, running an analysis on cases with the ties excluded (e.g. treating them as missing), and, also excluding those cases making automatic selections, is likely to yield a share distribution providing realistic weights. These share weights can be input as a range in the Tie Weights range textbox. Then re-run the analysis for the total sample with the weighted ties optionbutton on.

The tie weights are applied only to the corresponding tied options for such cases. So, if the weights for an analysis with three options are: 0.10, 0.30, 0.60, and the tied options for Case A are Options 1 and 2, then these options' selection probabilities would be 0.25 and 0.75 with the total always adding to one.

Treating ties as missing

Where only firm selections are wanted, ties can be treated as missing data and the tied cases will be removed from the analysis.

Here, ties might be interpreted as showing selection uncertainty - by contrast with above interpretations, ties may be take to suggest equally or unequally likely or alternating selections, or, indecision (decision incapacity i.e. blockage).

Different types of ties

It is possible to have a variety of types of ties across cases, and even within cases. For example, ties can reflect alternating selections, equality of appeal, uncertainty or unresolved equivocation. Thus, some varieties of tie may validly represent comparative selections while others represent blockage. In some data collections there will be a predominance of one or the other but this will not always be evident. Where such potential ambiguities are likely to be significant or of particular interest, they may be best resolved by appropriate collection of data in the field, on the presence of the relevant types of blockage or comparative selection, coded accordingly, and then analysed appropriately.

Note that analogous thinking can be applied to the interpretation to probabilities.


14. Assigning missing values

Supply the code for any missing data. The missing data code only applies to the Analysis Data. (Invalid or missing Weight Data or Priors Data cases are eliminated before an analysis commences, where they apply, and reported separately.)

The code may be in words, letters or numbers and include spaces.

The code will be treated as text but not as case sensitive.

The code should be unique amongst all the analysis data.

Every case with a missing value will be deleted from the analysis.

Statistics on missing values are supplied with the results.

Significant numbers of missing values may undermine the quality of the analysis.


15. Setting blockage codes

Up to six blockage codes may be specified.

Blockage codes may be in words, letters or numbers and include spaces.

The codes will be treated as text but are not case sensitive.

The codes should be unique amongst all the analysis data.

If you have more than six blockage codes then temporarily code the five-plus codes into a collective code for a first analysis. Then re-run with the unanalysed codes separated and the already analysed codes in a collective code. The results from the two runs are additive with respect to the blockage components and can be amalgamated manually into a single table.


16. Analysis output

For every analysis at least two Excel sheets are output:

1) General analysis of blockage or of both blockage and comparative selections, providing:

  1. documentation of the model run,
  2. information on deleted cases where weight or priors data is missing,
  3. the composition of the sample across the options with respect to blockage and comparative selections, and
  4. the effects of blockage and comparative selections on shares.

2) Processed input data:

  1. listing cases with non-missing data,
  2. weight data if selected (adjusted if part of the analysis),
  3. priors data if selected (adjusted if part of the analysis), and
  4. any other data selected that you wished to retain.

If you wish to identify the positive and negative effects of automatic selections on blockage then check the "Show automatic selections" checkbox. The table for these results will appear on the general analysis page preceding the full blockage table. These automatic selections are the ones which occur when all other alternatives are blocked e.g. when all other options are ignored. These results can be  compared with the following table to identify the proportion of blockage effects accounted for by automatic selections.

If you wish to know the statistical significance of each option's blockage and comparative selection coefficients then check the "Show statistical significance" checkbox. Significance is indicated by changes in the font size and styles of the results. For more about confidence intervals, statistical significance, and their presentation see here.

If you wish to produce additional sheets showing case-by-case results for:

  1. the effects of blockage or comparative selection on selection probabilities, or
  2. the separate effects of each blockage,

then check the checkbox "Include casewise effects in output".

These casewise results are adjusted for tie weights (where they apply) but not for sample weighting.

If you wish to produce a sheet retaining a copy of the raw input data as part of the analysis output then check the checkbox for "Include unprocessed input data in output"


17. Statistical significance of option coefficients

The statistical significance of analyses is obtained by checking the "Show statistical significance" checkbox.

Statistical tests are provided for each option's blockage and comparative selection coefficients for the sample. They are also provided for the share impacts of automatic selections.

The standard error for a blockage coefficient is the se of a proportion for the sum of the first two terms in the selection equation. The coefficients are based on simple random sampling assumptions applied to both unweighted and weighted results.

The standard error for a comparative selection coefficient is the se of a proportion for the sum of first three terms of the equations for the sample.

With tables showing losses and gains the procedure is a little more complex. The losses and gains are each averaged across the options. The respective averages are subtracted from the corresponding option coefficients to yield zero means for each row. The standard errors of the adjusted coefficients are then calculated as described above. Significant coefficients indicate the presence of differential or interaction effects for losses and gains across the options. These results are associated with the unadjusted loss and gain coefficients in the way described below for the other coefficients.

A key test is whether a given confidence interval based on the se is less than the size of the coefficient being evaluated. This approach is used for testing the statistical significance of the blockage and comparative coefficients which are displayed in the output as follows:

1) At the 0.05 level - coefficients are shown in a larger bold font.

2) At the 0.01 level - coefficients are shown in the same larger bold font but also in italics.

All the tests are two-tailed.

Note: The se of a proportion = sqrt [p * (1 - p) / n ].