alexa Sampling and Estimation in Hidden Population Using Social Network | OMICS International
ISSN 2155-6113
Journal of AIDS & Clinical Research
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Sampling and Estimation in Hidden Population Using Social Network

Yang Zhao*

Department of Mathematics and Statistics, University of Regina, Canada

Corresponding Author:
Yang Zhao
Department of Mathematics and Statistics
University of Regina, College West 307.14
Regina, SK, S4S 0A2, Canada
Tel: 306 585-4348
Fax: 306 585-4020
E-mail: [email protected]

Received date: February 07, 2017; Accepted date: February 20, 2017; Published date: February 27, 2017

Citation: Zhao Y (2017) Sampling and Estimation in Hidden Population Using Social Network. J AIDS Clin Res 8:667. doi:10.4172/2155-6113.1000667

Copyright: © 2017 Zhao Y. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of AIDS & Clinical Research

Abstract

Characteristics of hidden populations (e.g. population of injection drug users) cannot be studied using standard sampling and estimation procedures. This article considers methods for estimating the population proportion of hidden population using social network. We compare the sampling and estimation technique of respondent-driven sampling with the simplified sampling procedure based on Markov-chain model and discusses the equivalence of these procedures. These procedures fail to provide formulae for estimating the variances of their estimators due to the complexities of their methods. We describe a simplified sampling procedure for collecting data on both the population and its social network, and provide a simple formula to estimate the population proportion efficiently. We further derive a formula to compute an estimate of the variance of the proposed estimator using the delta method. Simulation study is provided to illustrate the new sampling and estimation method.

Keywords

Hidden population; Markov-chain model; Population proportion; Respondent-driven sampling; Social network relationships.

Introduction

Special populations that cannot be studied using standard sampling and estimation procedures are called hidden populations. For example, the populations of injection drug users, men who have sex with men, illegal immigrants, and the homeless. Consistent estimation of the size of these populations are crucial for researchers and policy makers.

Salganik and Heckathorn [1] provide a comprehensive review of sampling and estimation methods for studying hidden populations, including targeted sampling and time-space sampling [3]. They mention that these methods often fail to provide accurate estimates of the true values. They further point out that a common drawback of most methods is that they fail to use the social network relationships in many hidden populations, that is the network of relationships among the real people in the population, for example the network of friendships. They propose a sampling and estimation method based on a snowballtype sampling design [4], called respondent-driven sampling, which makes use of the social network relationships in a hidden population to collect information from the population of interest such that unbiased estimations of the population characteristics are possible. However, the consistency of their estimator depends on the assumption that individuals are randomly recruited into the study. Other problems with the multi-wave or the snowball-type sampling designs are the costs in regards to both time and money.

To avoid using the multi-wave sampling procedure Zhao [5] introduces a Markov-chain model for estimating the social network relationships. It computes the long run transition probabilities based on the Markov theory, then estimate population proportion using the result of Salganik and Heckathorn [1]. However, none of Salganik and Heckathorn [1] and Zhao [5] provides formulae for estimating the variances of their estimators due to the complexities of their sampling and estimating procedures.

In this article we describe a simplified sampling procedure to collect information on both the population and its social network relationships simultaneously. We derive a consistent estimator of the population proportion of hidden population based on the simplified sampling design. Organization of the rest of the article is as follows. In Section 2 we briefly review the respondent-driven sampling method of Salganik and Heckathorn [1] and the Markov-chain model of Zhao [5], and discuss the equivalence of these two approaches. A simplify sampling and estimating procedure is described in Section 3. A formula for estimating the variance of the proposed estimator is derived. Section 4 provides simulation study to examine the small sample performance of the proposed method. Section 5 gives a brief discussion to conclude the results.

Estimation Methods Using Social Network

Respondent-driven sampling

The main idea of the respondent-driven sampling Heckathorn [6] Salganik and Heckathorn [1] is summarized as follows.

Assume that a population is divided into 2 groups A and B, and they are connected by social network relationships, say friendships. Let NA and NB be the total number of people in group A and B respectively, PA=NA/(NA + NB ) and PB=1 − PA be the finite population proportion for group A and B respectively. The object is to estimate PA and PB. Let DAi be the number of friendships of the ith individual in group A. It’s also called the degree of the ith individual. The total number of friendships radiating from individuals in group A is

equation

Define

equation (1)

Here TAB=TBA is the total number of friendships radiating from members in group A to group B or vice versa. D A is called the average degree of people in group A. Similarly, D B and CBA are defined as above. Salganik and Heckathorn [1] show that

equation (2)

In practice, respondents are selected from the social network based on the respondent driven sampling design of Heckathorn [6], where a small number of initial seeds is selected first, then current seeds randomly recruit other friends into the sample, and the recruiting process continuous until the required sample size is reached. Let rAB be the total number of recruitments from individuals in group A to individuals in group B, rAA be the total number of recruitments from individuals in group A to other individuals in the same group, and the same for rBA and rBB . Based on the random recruitment assumption CAB , CBA, D A and D B can be consistently estimated by

equation (3)

respectively. Here nA and nB are the total numbers of individuals selected from groups A and B respectively. Then the population proportions can be estimated by substituting (3) to (2). Salganik and Heckathorn [1] show that these estimators are asymptotically unbiased regardless of how the initial seeds are selected.

The Markov-chain model

An important contribution of Zhao [5] is that they propose a onewave sampling design to collection information about the population and its social network relationships. In this design selected individuals are required to recruit all their friends into the study, information on how many friendships they have in group A and group B is recorded respectively, and random recruitment assumption is not required. Furthermore they describe a Markov-chain model for the social network relationships. Instead of using groups A and B, they define 2 states, A and B, and it is assumed that each individual is either in state A or B but not both. Suppose individuals are selected using respondentdriven sampling design, and let PAB be the probability that a randomly selected individual in state A will recruit an individual in state B, and PAA=1−PAB be the probability that a randomly selected individual in state A will recruit an individual in state A. Similarly PBA and PBB can be defined as above. Then the transition probability matrix for a first order Markov-chain model can be denoted as

equation

Under the condition that P is an ergotic irreducible transition matrix, in the long run the probability that an individual in state A will be selected is

equation (4)

and πB=1-πA. Then to estimate the population proportion Zhao [5] recommends using the results of Salganik and Heckathorn [1] as

equation (5)

In practice to compute estimates of the population proportions PA and PB using the above formulae we need to estimate D A, D B , PAB and PBA. However, if we substitute (4) to (5) directly, we get

equation (6)

Comparing the estimators in (2) and (6), it is easy to see that PAB , PBA and CAB ,CBA are essentially measuring the same quantities in the two different models, and the two methods are therefore equivalent.

Neither Salganik and Heckathorn [1] nor Zhao [5] provide variance estimators for their estimators because of the complexities of their sampling and estimating techniques. Next we describe a simplified sampling and estimation procedure for estimating PA and PB, and the corresponding variances of the estimators.

A Simplified Sampling and Estimating Method

We consider the one-wave sampling design of Zhao [5]. Let A and B represent the two groups A and B in the same settings as Salganik and Heckathorn [1]. We defined new random variables ZAi and ZBi which represent the total number of friendships radiating from the ith individual in group A to individuals in group B and the total number of friendships radiating from the ith individual in group B to individuals in group A respectively. Here the within group friendships are ignored. We define

equation (7)

they represent the average degree of associations from group A to group B and from group B to group A respectively. If we treat {ZAi : i = 1, · · · , NA} and {ZBi : i = 1, · · · , NB} as two sub-populations, then Z A and Z B are the corresponding sub-population means.

As TAB = TBA from (7) we can derive that

equation (8)

Substituting (8) to (9)

equation (9)

we obtain

equation (10)

Therefore consistent estimates of PA and PB can be obtained if both Z A and Z B can be estimated consistently. We know that Z A and Z B only contain the between group friendships and the within group friendships are completely ignored. The above result indicates (i) consistent estimation of the proportions PA and PB can be achieved using only the information of the between group friendships; and (ii) the one-wave (or two-wave) sampling design of Zhao [5] can be further simplified and for the individuals selected in the sample we only need to record the information on how many friendships they have in the other group.

In practice assume that a sample is drawn from a target population with two groups A and B. We will record the total number of friendships radiating to the other groups, ZAi or ZBi, for each individual selected from group A or B. Let Z A and Z B be the corresponding estimators of the sub-population means Z A and Z B respectively, then the proportions PA and PB can be estimated as

equation (11)

and the variances can be estimated using the delta method as

equation (12)

For example if z¯A and z¯B are the sample means of simple random samples selected from groups A and B independently, then we can compute an estimate of the variance as

equation (13)

when the finite population correction factors equation can be ignored. Here S2A and S2B are the sample variance for group A and B respectively.

In the appendix (Appendix 1) we show that our proposed estimators for PA and PB are equivalent to Salganik and Heckathorn’s [1] estimators, however, they are much simplified which allow us to construct a formula to estimate their variances analytically.

Simulation Study

In this section we use simulation study to examine the small sample performance of the proposed sampling and estimation method. We consider the setting similar to that of Salganik and Heckathorn [1].

The numbers of friendships D’Ais and D’Bis are generated using exponential distribution with means μA and μB for groups A and B respectively, and D’Ais and D’Bis take the closest integer values. let I denote the interconnectedness, and TAB=TBA=I × min(RA, RB ). We generate data for NA=3, 000, NB=7, 000, μA=20, μB=10, and I=0.6. We select simple random sample of size nA and nB from group A and B independently. Equations (11) and (13) are used to estimate PA and PB and the corresponding se.'s. Table 1 shows the results for estimation PA based on 10, 000 replications for different sample sizes (nA, nB ). We see that all the biases are close to 0, the means of standard errors (se.'s) are close to the empirical standard deviations (sd.), and the 95% coverage probabilities are close to the nominal value. The results indicate that the overall performance of the proposed method is accepTABle for practical implementation.

(nA; nB) Bias Mean of se:'s 95%CP Empirical sd:
(10,10) -0.0003 0.0341 0.9326 0.0342
(15,15) 0.0005 0.0277 0.9335 0.0281
(20,20) -0.0001 0.0244 0.9379 0.0246
(25,25) 0:0426 0.0216 0.9408 0.0218
(30,30) -0.0002 0.0197 0.9447 0.0199
(35,35) -0.0001 0.0186 0.9478 0.0185

Table 1: Estimation the population proportion (PA) in small samples.

Discussion

This research describes a simplified sampling and estimation procedure for estimating the population proportion for hidden population. The new method makes significant improvements of Salganik and Heckathorn’s [1] methodology by simplifying the formula of Salganik and Heckathorn’s [1] [1,6] estimator, and providing analytic formula for estimating the variance of the proposed estimator. The simplified estimator indicates that consistent estimate of the population proportion does not depend on the information of within group social network relationships, which allows us to further simplify the one-wave sampling procedure of Zhao [5]where the random recruitment assumption is not required.

The new sampling and estimation method is motivated by the initial idea of simplifying the sampling procedure of the respondent-driven sampling in Zhao [5]. They propose the one-wave sampling design where information on the social network relationships is observed completely for each individual selected in the sample and random recruitment is not required. We would expect that the social network relationships can be estimated more efficiently in the new sampling design. However, they fail to supply a new estimator to compute consistent estimates of population proportions, and they eventually use Salganik and Heckathorn’s [1] estimator which is functionally complicated and analytic variance estimation is not available.

In applied statistics simple and efficient methods are always respecTABle. We hope that the proposed methods can be used to improve some studies in epidemiology and social problems.

Acknowledgement

This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada (YZ).

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Recommended Conferences

  • Global experts meet on STD - AIDS and Infectious Diseases
    May 30-31, 2018 Auckland, New Zealand
  • 6th World Congress on Control and Prevention of HIV/AIDS , STDs & STIs
    August 27-29, 2018 Zurich, Switzerland
  • 6th International Conference on HIV/AIDS , STDs and STIs
    October 29-30, 2018 San Francisco, USA

Article Usage

  • Total views: 834
  • [From(publication date):
    February-2017 - Dec 15, 2017]
  • Breakdown by view type
  • HTML page views : 769
  • PDF downloads : 65
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

 
© 2008- 2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version