Underlying model and math

The underlying model

We provide a web application, a command-line application, and a Java API, for a process in which a researcher (moderator or coordinator) carries out a survey, iniviting a number of experts to provide their statistical estimates about the behaviour of a group ("population") of illegal border crossers. It is assumed that a number of persons (the "population") have been once apprehended as illegal border crossers and removed from the country. Some percentage of them will attempt to cross the border again, and some percentage of those will be apprehended. Observable values.

During the K consecutive periods (T1, T2, ..., TK) following the removal, some of those removed persons will be apprehended again when trying to cross the border. The numbers of these apprehensions, b1, b2, ..., bK, measured as fractions of the original "population" of the removed persons, are the only observable values we have. Some other members of the "population" may attempt to cross the border even later (after the end of the K observation periods, i.e. during the future time for which we don't have any observable data), and some will never attempt another crossing (we say that they are "deterred").

In this model, we simplify the reality by assuming that no one tried to carry out more than one attempt at a repeat crossing, i.e. every member of the population either attempts to cross during one of the K periods, "even later", or never.

Aggregation

The researcher is interested in the number of people who attempt crossing during each time period. It is not directly observable - we ony know the numbers of apprehensions for each of the K periods (i.e. the number of attempted crossings multiplied by the apprehension rate), but not the number of people who have made the crossing without being apprehended.

(We also assume that the apprehension rate is the same during all K periods, which means that the relative values of attempt rates during different periods are the same as the relative values of the observables apprehension nhumbers. That is, for example, if we know that twice as many people from our "population" were appehended in Decemeber than in August, then we believe that twice as many people attempted crossing in in Decemeber than in August. This, of course, is a simplifying assumption).

To form an educated guess of the values that are not directly observable (the crossing attempt rates in various periods, and the apprehension rate), the researcher (coordinator) canvasses a number of experts, asking them to provide their estimates of these unobservable numbers. The experts are not shown the known observables (the "ground truth" data).

For a given number (K) of consecutive observation periods, the expert will be asked to provide the estimates of:

Since it is assumed that no one makes more than crossing attempt after the initial release, the K+2 rates in items 1-3 above must sum to a1 + .. + aK + aK+1 + aK+2 = 1.0.

For the convenience of the participating experts, in this application the rates are multipled by 100 (i.e. one enters 5, rather than 0.05) to mean "5% of the population". Thus the K+2 values entered by the expert in items 1-3 above must sum to 100.

Processing the experts' estimates

While the experts are not told what the actually observed values are, the coordinator has the observables at his or her disposal, and uses them to evaluate the quality of experts' estimates. The following procedure is used:

Weighting of experts.

  1. For each expert, we compute the vector of expected observables that would be observed if the expert's estimates were correct. It consists of K+1 numbers, i.e. the K expected apprehension numbers for the K periods and the number of people who are not apprehended during these K (because they either never attempt to cross, or attempt to cross after the end of TK, or because they cross during one of the TK periods without being apprehended). The numbers are expressed as fractions of the total "population", and therefore sum to 1.0.
  2. For each expert, the vector of observables computed for this expert in the previous step is compared with the "ground truth" observables (i.e. the actual apprehension numbers during the TK periods, and the number of people who are not apprehended; these are also expressed as fractions of the total "population"). The two (K+1)-dimensional vectors can be viewed as probability distributions. For the measure of dissimilarity d(p,q) between two such distribution p and q we use the Normalized Jensen-Shannon divergence (NJSD), which is defined as the Jensen-Shannon divergence between p and q divided by the maximum possible value of JSD, which is log(2). Thus,
    d(p,q) = JSD(p,q)/log(2).
    Among all possible distributions p and q, the value of d(p,q) ranges between 0 and 1. The value of 0 obtains only when the two distributions are identical; the value of 1 is reached only when the two distributions are so different that p only has non-zero values in the periods when q has zero values, and vice versa.
  3. Based on the NJSD values computed for the experts' predictions, a weight is assigned to each expert's prediction. Two modes are supported for computing weights:
Averaging.

A weighted average of all predictions is computed, with the apprehension rate

r* = (Σx w(x) r(x)) /(Σx w(x)),
and the attempted crossing rates
a*i = (Σx ai(x) r(x)) /(Σx w(x)),
where r(x) and ai(x) (for i=1 to K+2) are the estimates, respectively, of the apprehension rate r and the attempted crossing rate (or deterrance rate, for i=K+2) ai, provided by expert x.

The "missed to capured ratio" (MCR),

μ = (1-r)/r = 1/r -1,
is computed for each expert estimate. Its meaning is the expert's estimation of the ratio of the number of crossers who have evaded capture to the number of apprehended crossers.

In addition to the averaged estimates of the crossing rates and apprehension rates, the system also computes the weighted average of the "missed to capured ratios", i.e.

μ* = (Σ w(x) μ(x)) /(Σx w(x)).
Note that μ* is normally not equal to 1/r*-1, because the average of the inverse values is not equal to the inverse of the averages. In fact, unless all experts predicted the same apprehension rate, the averaged MCR will be greater than the MCR that one can compute based on the averaged apprehension rate.

Along with the weighted averages, standard deviations are computed for each value.


Back to main documentation page