Medical, Pharma, Engineering, Science, Technology and Business

Department of Mathematics and Statistics, University of Regina, Canada

- Corresponding Author:
- Yang Zhao

Department of Mathematics and Statistics

University of Regina, College West 307.14

Regina, SK, S4S 0A2, Canada

**Tel:**306 585-4348

**Fax:**306 585-4020

**E-mail:**[email protected]

**Received date:** February 07, 2017; **Accepted date:** February 20, 2017; **Published date:** February
27, 2017

**Citation:** Zhao Y (2017) Sampling and Estimation in Hidden Population Using Social
Network. J AIDS Clin Res 8:667. doi:10.4172/2155-6113.1000667

**Copyright:** © 2017 Zhao Y. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and
source are credited.

**Visit for more related articles at** Journal of AIDS & Clinical Research

Characteristics of hidden populations (e.g. population of injection drug users) cannot be studied using standard sampling and estimation procedures. This article considers methods for estimating the population proportion of hidden population using social network. We compare the sampling and estimation technique of respondent-driven sampling with the simplified sampling procedure based on Markov-chain model and discusses the equivalence of these procedures. These procedures fail to provide formulae for estimating the variances of their estimators due to the complexities of their methods. We describe a simplified sampling procedure for collecting data on both the population and its social network, and provide a simple formula to estimate the population proportion efficiently. We further derive a formula to compute an estimate of the variance of the proposed estimator using the delta method. Simulation study is provided to illustrate the new sampling and estimation method.

Hidden population; Markov-chain model; Population proportion; Respondent-driven sampling; Social network relationships.

Special populations that cannot be studied using standard sampling and estimation procedures are called hidden populations. For example, the populations of injection drug users, men who have sex with men, illegal immigrants, and the homeless. Consistent estimation of the size of these populations are crucial for researchers and policy makers.

Salganik and Heckathorn [1] provide a comprehensive review of sampling and estimation methods for studying hidden populations, including targeted sampling and time-space sampling [3]. They mention that these methods often fail to provide accurate estimates of the true values. They further point out that a common drawback of most methods is that they fail to use the social network relationships in many hidden populations, that is the network of relationships among the real people in the population, for example the network of friendships. They propose a sampling and estimation method based on a snowballtype sampling design [4], called respondent-driven sampling, which makes use of the social network relationships in a hidden population to collect information from the population of interest such that unbiased estimations of the population characteristics are possible. However, the consistency of their estimator depends on the assumption that individuals are randomly recruited into the study. Other problems with the multi-wave or the snowball-type sampling designs are the costs in regards to both time and money.

To avoid using the multi-wave sampling procedure Zhao [5] introduces a Markov-chain model for estimating the social network relationships. It computes the long run transition probabilities based on the Markov theory, then estimate population proportion using the result of Salganik and Heckathorn [1]. However, none of Salganik and Heckathorn [1] and Zhao [5] provides formulae for estimating the variances of their estimators due to the complexities of their sampling and estimating procedures.

In this article we describe a simplified sampling procedure to collect information on both the population and its social network relationships simultaneously. We derive a consistent estimator of the population proportion of hidden population based on the simplified sampling design. Organization of the rest of the article is as follows. In Section 2 we briefly review the respondent-driven sampling method of Salganik and Heckathorn [1] and the Markov-chain model of Zhao [5], and discuss the equivalence of these two approaches. A simplify sampling and estimating procedure is described in Section 3. A formula for estimating the variance of the proposed estimator is derived. Section 4 provides simulation study to examine the small sample performance of the proposed method. Section 5 gives a brief discussion to conclude the results.

**Respondent-driven sampling**

The main idea of the respondent-driven sampling Heckathorn [6] Salganik and Heckathorn [1] is summarized as follows.

Assume that a population is divided into 2 groups A and B, and they
are connected by social network relationships, say friendships. Let N_{A} and N_{B} be the total number of people in group A and B respectively,
P_{A}=N_{A}/(N_{A} + N_{B} ) and P_{B}=1 − P_{A} be the finite population proportion
for group A and B respectively. The object is to estimate P_{A} and P_{B}.
Let D_{Ai} be the number of friendships of the i^{th} individual in group
A. It’s also called the degree of the i^{th} individual. The total number of
friendships radiating from individuals in group A is

Define

(1)

Here T_{AB}=T_{BA} is the total number of friendships radiating from
members in group A to group B or vice versa. D
A is called the average
degree of people in group A. Similarly, D
B and C_{BA} are defined as
above. Salganik and Heckathorn [1] show that

(2)

In practice, respondents are selected from the social network based
on the respondent driven sampling design of Heckathorn [6], where
a small number of initial seeds is selected first, then current seeds
randomly recruit other friends into the sample, and the recruiting
process continuous until the required sample size is reached. Let r_{AB} be the total number of recruitments from individuals in group A to
individuals in group B, r_{AA} be the total number of recruitments from
individuals in group A to other individuals in the same group, and the
same for r_{BA} and r_{BB} . Based on the random recruitment assumption
C_{AB} , C_{BA}, D
A and D
B can be consistently estimated by

(3)

respectively. Here n_{A} and n_{B} are the total numbers of individuals
selected from groups A and B respectively. Then the population
proportions can be estimated by substituting (3) to (2). Salganik and
Heckathorn [1] show that these estimators are asymptotically unbiased
regardless of how the initial seeds are selected.

**The Markov-chain model**

An important contribution of Zhao [5] is that they propose a onewave
sampling design to collection information about the population
and its social network relationships. In this design selected individuals
are required to recruit all their friends into the study, information on
how many friendships they have in group A and group B is recorded
respectively, and random recruitment assumption is not required.
Furthermore they describe a Markov-chain model for the social
network relationships. Instead of using groups A and B, they define 2
states, A and B, and it is assumed that each individual is either in state
A or B but not both. Suppose individuals are selected using respondentdriven
sampling design, and let P_{AB} be the probability that a randomly
selected individual in state A will recruit an individual in state B, and
P_{AA}=1−P_{AB} be the probability that a randomly selected individual in
state A will recruit an individual in state A. Similarly P_{BA} and P_{BB} can
be defined as above. Then the transition probability matrix for a first
order Markov-chain model can be denoted as

Under the condition that P is an ergotic irreducible transition matrix, in the long run the probability that an individual in state A will be selected is

(4)

and πB=1-πA. Then to estimate the population proportion Zhao [5] recommends using the results of Salganik and Heckathorn [1] as

(5)

In practice to compute estimates of the population proportions PA
and PB using the above formulae we need to estimate D A, D B , P_{AB} and
P_{BA}. However, if we substitute (4) to (5) directly, we get

(6)

Comparing the estimators in (2) and (6), it is easy to see that P_{AB} , P_{BA} and CAB ,CBA are essentially measuring the same quantities in
the two different models, and the two methods are therefore equivalent.

Neither Salganik and Heckathorn [1] nor Zhao [5] provide variance
estimators for their estimators because of the complexities of their
sampling and estimating techniques. Next we describe a simplified
sampling and estimation procedure for estimating P_{A} and P_{B}, and the
corresponding variances of the estimators.

We consider the one-wave sampling design of Zhao [5]. Let A and
B represent the two groups A and B in the same settings as Salganik
and Heckathorn [1]. We defined new random variables Z_{Ai} and Z_{Bi} which represent the total number of friendships radiating from the ith
individual in group A to individuals in group B and the total number of
friendships radiating from the ith individual in group B to individuals
in group A respectively. Here the within group friendships are ignored.
We define

(7)

they represent the average degree of associations from group A to
group B and from group B to group A respectively. If we treat {Z_{Ai} : i =
1, · · · , NA} and {Z_{Bi} : i = 1, · · · , NB} as two sub-populations, then Z A
and Z B are the corresponding sub-population means.

As T_{AB} = T_{BA} from (7) we can derive that

(8)

Substituting (8) to (9)

(9)

we obtain

(10)

Therefore consistent estimates of P_{A} and P_{B} can be obtained
if both Z A and Z B can be estimated consistently. We know that Z A
and Z B only contain the between group friendships and the within
group friendships are completely ignored. The above result indicates
(i) consistent estimation of the proportions P_{A} and P_{B} can be achieved
using only the information of the between group friendships; and (ii)
the one-wave (or two-wave) sampling design of Zhao [5] can be further
simplified and for the individuals selected in the sample we only need
to record the information on how many friendships they have in the
other group.

In practice assume that a sample is drawn from a target population
with two groups A and B. We will record the total number of friendships
radiating to the other groups, Z_{Ai} or Z_{Bi}, for each individual selected
from group A or B. Let Z A and Z B be the corresponding estimators of
the sub-population means Z A and Z B respectively, then the proportions
P_{A} and P_{B} can be estimated as

(11)

and the variances can be estimated using the delta method as

(12)

For example if z¯A and z¯B are the sample means of simple random samples selected from groups A and B independently, then we can compute an estimate of the variance as

(13)

when the finite population correction factors can be ignored. Here S^{2}_{A} and S^{2}_{B} are the sample variance for group A and B respectively.

In the appendix (Appendix 1) we show that our proposed
estimators for P_{A} and P_{B} are equivalent to Salganik and Heckathorn’s
[1] estimators, however, they are much simplified which allow us to
construct a formula to estimate their variances analytically.

In this section we use simulation study to examine the small sample performance of the proposed sampling and estimation method. We consider the setting similar to that of Salganik and Heckathorn [1].

The numbers of friendships D’Ais and D’Bis are generated using
exponential distribution with means μA and μB for groups A and B
respectively, and D’_{Ai}s and D’_{Bi}s take the closest integer values. let I
denote the interconnectedness, and T_{AB}=T_{BA}=I × min(RA, RB ). We
generate data for N_{A}=3, 000, N_{B}=7, 000, μA=20, μB=10, and I=0.6. We
select simple random sample of size nA and nB from group A and B
independently. Equations (11) and (13) are used to estimate PA and PB and
the corresponding se.'s. **Table 1** shows the results for estimation PA based
on 10, 000 replications for different sample sizes (n_{A}, n_{B} ). We see that all
the biases are close to 0, the means of standard errors (se.'s) are close to the
empirical standard deviations (sd.), and the 95% coverage probabilities are
close to the nominal value. The results indicate that the overall performance
of the proposed method is accepT_{AB}le for practical implementation.

(nA; nB) | Bias | Mean of se:'s | 95%CP | Empirical sd: |
---|---|---|---|---|

(10,10) |
-0.0003 | 0.0341 | 0.9326 | 0.0342 |

(15,15) |
0.0005 | 0.0277 | 0.9335 | 0.0281 |

(20,20) |
-0.0001 | 0.0244 | 0.9379 | 0.0246 |

(25,25) |
0:0264 |
0.0216 | 0.9408 | 0.0218 |

(30,30) |
-0.0002 | 0.0197 | 0.9447 | 0.0199 |

(35,35) |
-0.0001 | 0.0186 | 0.9478 | 0.0185 |

Note: 0:0426=0.000026.

**Table 1:** Estimation the population proportion (P_{A}) in small samples.

This research describes a simplified sampling and estimation procedure for estimating the population proportion for hidden population. The new method makes significant improvements of Salganik and Heckathorn’s [1] methodology by simplifying the formula of Salganik and Heckathorn’s [1] [1,6] estimator, and providing analytic formula for estimating the variance of the proposed estimator. The simplified estimator indicates that consistent estimate of the population proportion does not depend on the information of within group social network relationships, which allows us to further simplify the one-wave sampling procedure of Zhao [5]where the random recruitment assumption is not required.

The new sampling and estimation method is motivated by the initial idea of simplifying the sampling procedure of the respondent-driven sampling in Zhao [5]. They propose the one-wave sampling design where information on the social network relationships is observed completely for each individual selected in the sample and random recruitment is not required. We would expect that the social network relationships can be estimated more efficiently in the new sampling design. However, they fail to supply a new estimator to compute consistent estimates of population proportions, and they eventually use Salganik and Heckathorn’s [1] estimator which is functionally complicated and analytic variance estimation is not available.

In applied statistics simple and efficient methods are always
respecT_{AB}le. We hope that the proposed methods can be used to improve
some studies in epidemiology and social problems.

This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada (YZ).

- Salganik MJ, Heckathorn DD (2004) Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34: 193-239.
- Watters JK, Biernacki P (1989) Targeted sampling: Options for the study of hidden populations. Social Problems 36: 416-430.
- Muhir FB, Lin LS, Stueve A, Miller RL, Ford WL, et al. (2001) A venue-based method for sampling hard-to-reach populations. Public Health Reports 116: 216-222.
- Coleman JS (1958) Relational analysis: The study of social orgainzation with survey methods. Human Organization 17: 28-36.
- Zhao Y (2011) Estimating the size of an injecting drug user population. World Journal of AIDS 1: 88-93.
- Heckathorn DD (1997) Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems 49: 11-34.

Select your language of interest to view the total content in your interested language

- AIDS
- Advanced HIV Infection Preventive Measures
- Advanced Therapies
- Advances in HIV Diagnosis
- Advances in HIV Medications
- Advances in HIV Tests
- Advances in HIV Treatment
- Advances in Preventive Transmitted Disease
- Antiretroviral Therapy
- Antiretroviral Treatment
- Antiretrovirals
- Antivirals
- Bacteriophages
- Chicken Pox
- Ciprofloxacin
- Clinical Virology
- Colon Infection
- Combination Therapy
- Conjunctivitis
- Coronaviruses
- Cross Infection Control
- Drug Resistance of HIV
- Emerging Viral Diseases
- Fifth Disease
- Gallbladder tuberculosis
- H1N1
- HIV
- HIV Status
- HIV Super Infection
- HIV Vaccines
- HIV Virology
- HIV and AIDS Research
- Hepatitis Virology
- Herpes Virus
- Highly Active Antiretroviral Therapy
- Human Papilloma Virus
- Infection
- Infection in Blood
- Infections Prevention
- Infectious Diseases in Children
- Influenza
- Influenza virus
- Innovative HIV drugs
- Interferons (IFNs)
- Modern Virology
- Plant Virology
- Protease Inhibitors
- Respiratory Tract Infections
- Retrovirology
- Risk Analysis: HIV
- Risk Behaviours of HIV/AIDS
- Structural Virology
- T Cell Lymphomatic Virus
- Treatment for Infectious Diseases
- Veterinary Virology
- Viral Encephalitis
- Viral Immunity
- Viral Infection Transmission
- Viral Infection Treatment drugs
- Viral Infection Treatment for Babies
- Viral Infection during Chemotherapy
- Viral Infection in Brain
- Viral Infection in Pregnancy
- Viral Infection in Throat
- Viral Infectious Diseases
- Viral Vaccine
- Viral Vaccines
- Viral Vectors
- Virology Methods
- Virology Techniques
- Virology and Immunology
- Yeast Infection

**Annual Summit on HIV/AIDS, STDs & STIs**

August 07-09, 2017 Beijing, China**International conference on Sexually Transmitted Diseases & AIDS**

September 21-22, 2017 San Antonio,USA**5th International Conference on****HIV/AIDS, STDs and STIs**

November 13-14, 2017 Las Vegas, Nevada, USA

- Total views:
**651** - [From(publication date):

February-2017 - Jul 23, 2017] - Breakdown by view type
- HTML page views :
**607** - PDF downloads :
**44**

Peer Reviewed Journals

International Conferences 2017-18