Inhaltsverzeichnis von Handbook of Big Data Analytics vom Springer-Verlag

Suchen und Finden

Service

Handbook of Big Data Analytics

Wolfgang Karl Härdle, Henry Horng-Shing Lu, Xiaotong Shen

Verlag Springer-Verlag, 2018

ISBN 9783319182841 , 532 Seiten

Format PDF, OL

Kopierschutz Wasserzeichen

Geräte

Mehr zum Inhalt

Handbook of Big Data Analytics

Kapitelkauf
Kurzinformation
Inhaltsverzeichnis
Leseprobe
Blick ins Buch
Fragen zum eBook

Preface: 6
Contents: 7
Part I Overview: 9
1 Statistics, Statisticians, and the Internet of Things: 10
1.1 Introduction: 11
1.1.1 The Internet of Things: 11
1.1.2 What Is Big Data in an Internet of Things?: 11
1.1.3 Building Blocks: 12
1.1.4 Ubiquity: 13
1.1.5 Consumer Applications: 15
1.1.6 The Internets of [Infrastructure] Things: 17
1.1.7 Industrial Scenarios: 19
1.2 What Kinds of Statistics Are Needed for Big IoT Data?: 20
1.2.1 Coping with Complexity: 20
1.2.2 Privacy: 21
1.2.3 Traditional Statistics Versus the IoT: 22
1.2.4 A View of the Future of Statistics in an IoT World: 23
1.3 Big Data in the Real World: 24
1.3.1 Skills: 24
1.3.2 Politics: 25
1.3.3 Technique: 25
1.3.4 Traditional Databases: 26
1.3.5 Cognition: 26
1.4 Conclusion: 27
2 Cognitive Data Analysis for Big Data: 29
2.1 Introduction: 30
2.1.1 Big Data: 30
2.1.2 Defining Cognitive Data Analysis: 31
2.1.3 Stages of CDA: 33
2.2 Data Preparation: 35
2.2.1 Natural Language Query: 36
2.2.2 Data Integration: 37
2.2.3 Metadata Discovery: 38
2.2.4 Data Quality Verification: 39
2.2.5 Data Type Detection: 40
2.2.6 Data Lineage: 41
2.3 Automated Modeling: 42
2.3.1 Descriptive Analytics: 42
2.3.2 Predictive Analytics: 43
2.3.3 Starting Points: 44
2.3.4 System Recommendations: 45
2.4 Application of Results: 46
2.4.1 Gaining Insights: 46
2.4.2 Sharing and Collaborating: 47
2.4.3 Deployment: 47
2.5 Use Case: 48
2.6 Conclusion: 52
References: 52
Part II Methodology: 54
3 Statistical Leveraging Methods in Big Data: 55
3.1 Background: 55
3.2 Leveraging Approximation for Least Squares Estimator: 58
3.2.1 Leveraging for Least Squares Approximation: 58
3.2.2 A Matrix Approximation Perspective: 60
3.2.3 The Computation of Leveraging Scores: 61
3.2.4 An Innovative Proposal: Predictor-Length Method: 61
3.2.5 More on Modeling: 63
3.2.6 Statistical Leveraging Algorithms in the Literature: A Summary: 63
3.3 Statistical Properties of Leveraging Estimator: 64
3.3.1 Weighted Leveraging Estimator: 64
3.3.2 Unweighted Leveraging Estimator: 66
3.4 Simulation Study: 68
3.4.1 UNIF and BLEV: 68
3.4.2 BLEV and LEVUNW: 69
3.4.3 BLEV and SLEV: 69
3.4.4 BLEV and PL: 70
3.4.5 SLEV and PL: 70
3.5 Real Data Analysis: 72
3.6 Beyond Linear Regression: 74
3.6.1 Logistic Regression: 74
3.6.2 Time Series Analysis: 75
3.7 Discussion and Conclusion: 76
References: 76
4 Scattered Data and Aggregated Inference: 79
4.1 Introduction: 80
4.2 Problem Formulation: 84
4.2.1 Notations: 84
4.2.2 Review on M-Estimators: 86
4.2.3 Simple Averaging Estimator: 86
4.2.4 One-Step Estimator: 87
4.3 Main Results: 88
4.3.1 Assumptions: 89
4.3.2 Asymptotic Properties and Mean Squared Errors (MSE) Bounds: 90
4.3.3 Under the Presence of Communication Failure: 91
4.4 Numerical Examples: 92
4.4.1 Logistic Regression: 93
4.4.2 Beta Distribution: 95
4.4.3 Beta Distribution with Possibility of Losing Information: 97
4.4.4 Gaussian Distribution with Unknown Mean and Variance: 99
4.5 Discussion on Distributed Statistical Inference: 100
4.6 Other Problems: 102
4.7 Conclusion: 104
References: 104
5 Nonparametric Methods for Big Data Analytics: 107
5.1 Introduction: 107
5.2 Classical Methods for Nonparametric Regression: 109
5.2.1 Additive Models: 109
5.2.2 Generalized Additive Models (GAM): 111
5.2.3 Smoothing Spline ANOVA (SS-ANOVA): 111
5.3 High Dimensional Additive Models: 113
5.3.1 COSSO Method: 114
5.3.2 Adaptive COSSO: 117
5.3.3 Linear and Nonlinear Discover (LAND): 119
5.3.4 Adaptive Group LASSO: 122
5.3.5 Sparse Additive Models (SpAM): 123
5.3.6 Sparsity-Smoothness Penalty: 124
5.4 Nonparametric Independence Screening (NIS): 125
References: 126
6 Finding Patterns in Time Series: 129
6.1 Introduction: 130
6.1.1 Regime Descriptors: Local Models: 130
6.1.2 Changepoints: 131
6.1.3 Patterns: 131
6.1.4 Clustering, Classification, and Prediction: 132
6.1.5 Measures of Similarity/Dissimilarity: 132
6.1.6 Outline: 132
6.2 Data Reduction and Changepoints: 133
6.2.1 Piecewise Constant Models: 134
6.2.2 Models with Changing Scales: 135
6.2.3 Trends: 136
6.3 Model Building: 138
6.3.1 Batch Methods: 139
6.3.2 Online Methods: 139
6.4 Model Building: Alternating Trends Smoothing: 139
6.4.1 The Tuning Parameter: 141
6.4.2 Modifications and Extensions: 144
6.5 Bounding Lines: 145
6.6 Patterns: 148
6.6.1 Time Scaling and Junk: 149
6.6.2 Further Data Reduction: Symbolic Representation: 150
6.6.3 Symbolic Trend Patterns (STP): 151
6.6.4 Patterns in Bounding Lines: 152
6.6.5 Clustering and Classification of Time Series: 153
References: 154
7 Variational Bayes for Hierarchical Mixture Models: 155
7.1 Introduction: 156
7.2 Variational Bayes: 158
7.2.1 Overview of the VB Method: 158
7.2.2 Practicality: 160
7.2.3 Over-Confidence: 161
7.2.4 Simple Two-Component Mixture Model: 161
7.2.5 Marginal Posterior Approximation: 164
7.3 VB for a General Finite Mixture Model: 166
7.3.1 Motivation: 166
7.3.2 The B-LIMMA Model: 167
7.4 Numerical Illustrations: 169
7.4.1 Simulation: 169
7.4.1.1 The B-LIMMA Model: 170
7.4.1.2 A Mixture Model Extended from the LIMMA Model: 173
7.4.1.3 A Mixture Model for Count Data: 179
7.4.2 Real Data Examples: 181
7.4.2.1 APOA1 Data: 181
7.4.2.2 Colon Cancer Data: 184
7.5 Discussion: 185
Appendix: The VB-LEMMA Algorithm: 187
The B-LEMMA Model: 187
Algorithm: 188
The VB-Proteomics Algorithm: 193
The Proteomics Model: 193
Algorithm: 194
References: 203
8 Hypothesis Testing for High-Dimensional Data: 206
8.1 Introduction: 206
8.2 Applications: 208
8.2.1 Testing of Covariance Matrices: 208
8.2.2 Testing of Independence: 209
8.2.3 Analysis of Variance: 210
8.3 Tests Based on L? Norms: 211
8.4 Tests Based on L2 Norms: 214
8.5 Asymptotic Theory: 216
8.5.1 Preamble: i.i.d. Gaussian Data: 217
8.5.2 Rademacher Weighted Differencing: 218
8.5.3 Calculating the Power: 219
8.5.4 An Algorithm with General Testing Functionals: 220
8.6 Numerical Experiments: 220
8.6.1 Test of Mean Vectors: 220
8.6.2 Test of Covariance Matrices: 224
8.6.2.1 Sizes Accuracy: 224
8.6.2.2 Power Curve: 224
8.6.3 A Real Data Application: 225
References: 226
9 High-Dimensional Classification: 228
9.1 Introduction: 228
9.2 LDA, Logistic Regression, and SVMs: 230
9.2.1 LDA: 230
9.2.2 Logistic Regression: 230
9.2.3 The Support Vector Machine: 231
9.3 Lasso and Elastic-Net Penalized SVMs: 233
9.3.1 The 1 SVM: 233
9.3.2 The DrSVM: 234
9.4 Lasso and Elastic-Net Penalized Logistic Regression: 235
9.5 Huberized SVMs: 237
9.6 Concave Penalized Margin-Based Classifiers: 243
9.7 Sparse Discriminant Analysis: 247
9.7.1 Independent Rules: 248
9.7.2 Linear Programming Discriminant Analysis: 250
9.7.3 Direct Sparse Discriminant Analysis: 251
9.8 Sparse Semiparametric Discriminant Analysis: 253
9.9 Sparse Penalized Additive Models for Classification: 256
References: 262
10 Analysis of High-Dimensional Regression Models Using Orthogonal Greedy Algorithms: 265
10.1 Introduction: 265
10.2 Convergence Rates of OGA: 267
10.2.1 Random Regressors: 267
10.2.2 The Fixed Design Case: 270
10.3 The Performance OGA Under General Sparse Conditions: 272
10.3.1 Rates of Convergence: 272
10.3.2 Comparative Studies: 273
10.4 The Performance of OGA in High-Dimensional Time Series Models: 276
References: 284
11 Semi-supervised Smoothing for Large Data Problems: 286
11.1 Introduction: 286
11.2 Semi-supervised Local Kernel Regression: 287
11.2.1 Supervised Kernel Regression: 288
11.2.2 Semi-supervised Kernel Regression with a Latent Response: 291
11.2.3 Adaptive Semi-supervised Kernel Regression: 294
11.2.4 Computational Issues for Large Data: 296
11.3 Optimization Frameworks for Semi-supervised Learning: 296
References: 299
12 Inverse Modeling: A Strategy to Cope with Non-linearity: 301
12.1 Introduction: 301
12.2 SDR and Inverse Modeling: 303
12.2.1 From SIR to PFC: 303
12.2.2 Revisit SDR from an Inverse Modeling Perspective: 305
12.3 Variable Selection: 308
12.3.1 Beyond Sufficient Dimension Reduction: The Necessity of Variable Selection: 308
12.3.2 SIR as a Transformation-Projection Pursuit Problem: 308
12.3.3 COP: Correlation Pursuit: 309
12.3.4 From COP to SIRI: 312
12.3.5 Simulation Study for Variable Selection and SDR Estimation: 315
12.4 Nonparametric Dependence Screening: 317
12.5 Conclusion: 321
References: 322
13 Sufficient Dimension Reduction for Tensor Data: 324
13.1 Curse of Dimensionality: 324
13.2 Sufficient Dimension Reduction: 326
13.3 Tensor Sufficient Dimension Reduction: 329
13.3.1 Tensor Sufficient Dimension Reduction Model: 329
13.3.2 Estimate a Single Direction: 330
13.4 Simulation Studies: 332
13.5 Example: 334
13.6 Discussion: 335
References: 336
14 Compressive Sensing and Sparse Coding: 338
14.1 Leveraging the Sparsity Assumption for Signal Recovery: 338
14.2 From Combinatorial to Convex Optimization: 339
14.3 Dealing with Noisy Measurements: 339
14.4 Other Common Forms and Variations: 340
14.5 The Theory Behind: 340
14.5.1 The Restricted Isometry Property: 340
14.5.2 Guaranteed Signal Recovery: 341
14.5.3 Random Matrix is Good Enough: 341
14.6 Compressive Sensing in Practice: 342
14.6.1 Solving the Compressive Sensing Problem: 342
14.6.2 Sparsifying Basis: 342
14.6.3 Sensing Matrix: 343
14.7 Sparse Coding Overview: 344
14.7.1 Compressive Sensing and Sparse Coding: 345
14.7.1.1 Compressed Domain Feature Extraction: 346
14.7.1.2 Compressed Domain Classification: 346
14.8 Compressive Sensing Extensions: 347
14.8.1 Reconstruction with Additional Information: 347
14.8.2 Compressive Sensing with Distorted Measurements: 347
References: 348
15 Bridging Density Functional Theory and Big Data Analytics with Applications: 350
15.1 Introduction: 351
15.2 Structure of Data Functionals Defined in the DFT Perspectives: 353
15.3 Determinations of Number of Data Groups and the Corresponding Data Boundaries: 358
15.4 Physical Phenomena of the Mixed Data Groups: 362
15.4.1 Physical Structure of the DFT-Based Algorithm: 362
15.4.2 Typical Problem of the Data Clustering:The Fisher's Iris: 364
15.4.3 Tentative Experiments on Dataset of MRI with Brain Tumors: 366
15.5 Conclusion: 369
References: 370
Part III Software: 374
16 Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing: 375
16.1 Introduction: From Data to Information: 376
16.1.1 Transparency, Collaboration, and Reproducibility: 377
16.2 Related Work: 378
16.3 Q3-D3 Genesis: 378
16.4 Vector Space Representations: 382
16.4.1 Text to Vector: 382
16.4.2 Weighting Scheme, Similarity, Distance: 384
16.4.3 Shakespeare's Tragedies: 389
16.4.4 Generalized VSM (GVSM): 391
16.4.4.1 Basic VSM (BVSM): 392
16.4.4.2 GVSM: Term–Term Correlations: 392
16.4.4.3 GVSM: Latent Semantic Analysis (LSA): 393
16.4.4.4 Closer Look at the LSA Implementation: 394
16.4.4.5 GVSM Applicability for Big Data: 395
16.5 Methods: 396
16.5.1 Cluster Analysis: 396
16.5.1.1 Partitional Clustering: 397
16.5.1.2 Hierarchical Clustering: 399
16.5.2 Cluster Validation Measures: 399
16.5.2.1 Connectivity: 400
16.5.2.2 Silhouette: 401
16.5.2.3 Dunn Index: 402
16.5.3 Visual Cluster Validation: 402
16.6 Results: 403
16.6.1 Text Preprocessing Results: 403
16.6.2 Sparsity Results: 404
16.6.3 Three Models, Three Methods, Three Measures: 406
16.6.4 LSA Anatomy: 411
16.7 Application: 411
16.8 Outlook: 413
16.8.1 GitHub Mining Infrastructure in R: 413
16.8.2 Future Developments: 414
Appendix: 415
References: 420
17 A Tutorial on Libra: R Package for the Linearized Bregman Algorithm in High-Dimensional Statistics: 423
17.1 Introduction to brownLibra: 424
17.2 Linear Model: 427
17.2.1 Example: Simulation Data: 429
17.2.2 Example: Diabetes Data: 431
17.3 Logistic Model: 432
17.3.1 Binomial Logistic Model: 432
17.3.1.1 Example: Publications of COPSS Award Winners: 434
17.3.1.2 Example: Journey to the West: 435
17.3.2 Multinomial Logistic Model: 436
17.4 Graphical Model: 438
17.4.1 Gaussian Graphical Model: 439
17.4.1.1 Example: Journey to the West: 440
17.4.2 Ising Model: 442
17.4.2.1 Example: Simulation Data: 443
17.4.2.2 Example: Journey to the West: 444
17.4.2.3 Example: Dream of the Red Chamber: 446
17.4.3 Potts Model: 448
17.5 Discussion: 450
References: 451
Part IV Application: 452
18 Functional Data Analysis for Big Data: A Case Study on California Temperature Trends: 453
18.1 Introduction: 453
18.2 Basic Statistics for Functional Data: 455
18.3 Dimension Reduction for Functional Data: 456
18.4 Functional Principal Component Analysis: 457
18.4.1 Smoothing and Interpolation: 459
18.4.2 Sample Size Considerations: 462
18.5 Functional Variance Process: 463
18.6 Functional Data Analysis for Temperature Trends: 465
18.7 Conclusions: 475
References: 476
19 Bayesian Spatiotemporal Modeling for Detecting Neuronal Activation via Functional Magnetic Resonance Imaging: 480
19.1 Introduction: 481
19.1.1 Emotion Processing Data: 482
19.2 Variable Selection in Bayesian Spatiotemporal Models: 483
19.2.1 Bezener et al.'s (2015) Areal Model: 484
19.2.1.1 Posterior Distribution and MCMC Algorithm: 486
19.2.1.2 Starting Values: 487
19.2.1.3 Emotion Processing Data: 487
19.2.2 Musgrove et al.'s (2015) Areal Model: 488
19.2.2.1 Partitioning the Image: 489
19.2.2.2 Spatial Bayesian Variable Selection with Temporal Correlation: 489
19.2.2.3 Sparse SGLMM Prior: 490
19.2.2.4 Posterior Computation and Inference: 491
19.2.2.5 Emotion Processing Data: 492
19.2.3 Activation Maps for Emotion Processing Data: 493
19.3 Discussion: 494
References: 494
20 Construction of Tight Frames on Graphs and Application to Denoising: 497
20.1 Introduction: 497
20.1.1 Motivation: 497
20.1.2 Relation to Previous Work: 498
20.2 Notation and Basics: 499
20.2.1 Setting: 499
20.2.2 Frames: 500
20.2.3 Neighborhood Graphs: 501
20.2.4 Spectral Graph Theory: 502
20.3 Construction and Properties: 503
20.3.1 Construction of a Tight Graph Frame: 503
20.3.2 Spatial Localization: 505
20.4 Denoising: 508
20.5 Numerical Experiments: 511
20.6 Outlook: 512
Appendix: 514
Proof of Theorem 3: 514
References: 515
21 Beta-Boosted Ensemble for Big Credit Scoring Data: 517
21.1 Introduction: 517
21.2 Method Description: 519
21.2.1 Beta Binomial Distribution: 519
21.2.2 Beta-Boosted Ensemble Model: 520
21.2.3 Toy Example: 522
21.2.4 Relation to Existing Solutions: 525
21.3 Experiments: 525
21.4 Conclusion and Future Work: 531
References: 531

Mediengruppe Stein

Buchhandlungen

Öffentliche Hand

Education

Bibliotheken

RWS

Corporate

Medical

Service

Aus- und Weiterbildung

Shop

Alle Preise verstehen sich inklusive der gesetzlichen MwSt.

Widerruf

Download-Voucher einlösen

Kopierschutz

= Kopierschutzfrei

= Wasserzeichen

= DRM Kopierschutz

AGB
Datenschutz
Impressum
Kontakt
F.A.Q
Widerruf

Handbook of Big Data Analytics

Wolfgang Karl Härdle, Henry Horng-Shing Lu, Xiaotong Shen

Handbook of Big Data Analytics

Preface

Contents

Part I Overview

1 Statistics, Statisticians, and the Internet of Things

1.1 Introduction

1.1.1 The Internet of Things

1.1.2 What Is Big Data in an Internet of Things?

1.1.3 Building Blocks

1.1.4 Ubiquity

1.1.5 Consumer Applications

1.1.6 The Internets of [Infrastructure] Things

1.1.7 Industrial Scenarios

1.2 What Kinds of Statistics Are Needed for Big IoT Data?

1.2.1 Coping with Complexity

1.2.2 Privacy

1.2.3 Traditional Statistics Versus the IoT

1.2.4 A View of the Future of Statistics in an IoT World

1.3 Big Data in the Real World

1.3.1 Skills

1.3.2 Politics

1.3.3 Technique

1.3.4 Traditional Databases

1.3.5 Cognition

1.4 Conclusion

2 Cognitive Data Analysis for Big Data

2.1 Introduction

2.1.1 Big Data

2.1.2 Defining Cognitive Data Analysis

2.1.3 Stages of CDA

2.2 Data Preparation

2.2.1 Natural Language Query

2.2.2 Data Integration

2.2.3 Metadata Discovery

2.2.4 Data Quality Verification

2.2.5 Data Type Detection

2.2.6 Data Lineage

2.3 Automated Modeling

2.3.1 Descriptive Analytics

2.3.2 Predictive Analytics

2.3.3 Starting Points

2.3.4 System Recommendations

2.4 Application of Results

2.4.1 Gaining Insights

2.4.2 Sharing and Collaborating

2.4.3 Deployment

2.5 Use Case

2.6 Conclusion

References

Part II Methodology

3 Statistical Leveraging Methods in Big Data

3.1 Background

3.2 Leveraging Approximation for Least Squares Estimator

3.2.1 Leveraging for Least Squares Approximation

3.2.2 A Matrix Approximation Perspective

3.2.3 The Computation of Leveraging Scores

3.2.4 An Innovative Proposal: Predictor-Length Method

3.2.5 More on Modeling

3.2.6 Statistical Leveraging Algorithms in the Literature: A Summary

3.3 Statistical Properties of Leveraging Estimator

3.3.1 Weighted Leveraging Estimator

3.3.2 Unweighted Leveraging Estimator

3.4 Simulation Study

3.4.1 UNIF and BLEV

3.4.2 BLEV and LEVUNW

3.4.3 BLEV and SLEV

3.4.4 BLEV and PL

3.4.5 SLEV and PL

3.5 Real Data Analysis

3.6 Beyond Linear Regression

3.6.1 Logistic Regression

3.6.2 Time Series Analysis

3.7 Discussion and Conclusion

References

4 Scattered Data and Aggregated Inference

4.1 Introduction

4.2 Problem Formulation

4.2.1 Notations