六步使用Python构建机器学习系统(Building Machine Learning Systems with Python_ Sec.pdf)

基本信息
源码名称：六步使用Python构建机器学习系统(Building Machine Learning Systems with Python_ Sec.pdf)
源码大小：6.90M
文件格式：.pdf
开发语言：Python
更新时间：2020-02-27
友情提示：（无需注册或充值，赞助后即可获取资源下载链接）
嘿，亲！知识可是无价之宝呢，但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下，绝对物超所值哦！如有下载和支付问题，请联系我们QQ(微信同号)：813200300
本次赞助数额为： 1 元　
源码介绍
Building Machine Learning Systems with Python_ Sec.pdf
Table of Contents
Preface vii
Chapter 1: Getting Started with Python Machine Learning 1
Machine learning and Python – a dream team 2
What the book will teach you (and what it will not) 3
What to do when you are stuck 4
Getting started 5
Introduction to NumPy, SciPy, and matplotlib 6
Installing Python 6
Chewing data efficiently with NumPy and intelligently with SciPy 6
Learning NumPy 7
Indexing 9
Handling nonexisting values 10
Comparing the runtime 11
Learning SciPy 12
Our first (tiny) application of machine learning 13
Reading in the data 14
Preprocessing and cleaning the data 15
Choosing the right model and learning algorithm 17
Before building our first model… 18
Starting with a simple straight line 18
Towards some advanced stuff 20
Stepping back to go forward – another look at our data 22
Training and testing 26
Answering our initial question 27
Summary 28
Chapter 2: Classifying with Real-world Examples 29
The Iris dataset 30
Visualization is a good first step 30
Building our first classification model 32
Evaluation – holding out data and cross-validation 36
www.allitebooks.com
Table of Contents
[ ii ]
Building more complex classifiers 39
A more complex dataset and a more complex classifier 41
Learning about the Seeds dataset 41
Features and feature engineering 42
Nearest neighbor classification 43
Classifying with scikit-learn 43
Looking at the decision boundaries 45
Binary and multiclass classification 47
Summary 49
Chapter 3: Clustering – Finding Related Posts 51
Measuring the relatedness of posts 52
How not to do it 52
How to do it 53
Preprocessing – similarity measured as a similar
number of common words 54
Converting raw text into a bag of words 54
Counting words 55
Normalizing word count vectors 58
Removing less important words 59
Stemming 60
Stop words on steroids 63
Our achievements and goals 65
Clustering 66
K-means 66
Getting test data to evaluate our ideas on 70
Clustering posts 72
Solving our initial challenge 73
Another look at noise 75
Tweaking the parameters 76
Summary 77
Chapter 4: Topic Modeling 79
Latent Dirichlet allocation 80
Building a topic model 81
Comparing documents by topics 86
Modeling the whole of Wikipedia 89
Choosing the number of topics 92
Summary 94
Chapter 5: Classification – Detecting Poor Answers 95
Sketching our roadmap 96
Learning to classify classy answers 96
Tuning the instance 96
Table of Contents
[ iii ]
Tuning the classifier 96
Fetching the data 97
Slimming the data down to chewable chunks 98
Preselection and processing of attributes 98
Defining what is a good answer 100
Creating our first classifier 100
Starting with kNN 100
Engineering the features 101
Training the classifier 103
Measuring the classifier's performance 103
Designing more features 104
Deciding how to improve 107
Bias-variance and their tradeoff 108
Fixing high bias 108
Fixing high variance 109
High bias or low bias 109
Using logistic regression 112
A bit of math with a small example 112
Applying logistic regression to our post classification problem 114
Looking behind accuracy – precision and recall 116
Slimming the classifier 120
Ship it! 121
Summary 121
Chapter 6: Classification II – Sentiment Analysis 123
Sketching our roadmap 123
Fetching the Twitter data 124
Introducing the Naïve Bayes classifier 124
Getting to know the Bayes' theorem 125
Being naïve 126
Using Naïve Bayes to classify 127
Accounting for unseen words and other oddities 131
Accounting for arithmetic underflows 132
Creating our first classifier and tuning it 134
Solving an easy problem first 135
Using all classes 138
Tuning the classifier's parameters 141
Cleaning tweets 146
Taking the word types into account 148
Determining the word types 148
Successfully cheating using SentiWordNet 150
Table of Contents
[ iv ]
Our first estimator 152
Putting everything together 155
Summary 156
Chapter 7: Regression 157
Predicting house prices with regression 157
Multidimensional regression 161
Cross-validation for regression 162
Penalized or regularized regression 163
L1 and L2 penalties 164
Using Lasso or ElasticNet in scikit-learn 165
Visualizing the Lasso path 166
P-greater-than-N scenarios 167
An example based on text documents 168
Setting hyperparameters in a principled way 170
Summary 174
Chapter 8: Recommendations 175
Rating predictions and recommendations 175
Splitting into training and testing 177
Normalizing the training data 178
A neighborhood approach to recommendations 180
A regression approach to recommendations 184
Combining multiple methods 186
Basket analysis 188
Obtaining useful predictions 190
Analyzing supermarket shopping baskets 190
Association rule mining 194
More advanced basket analysis 196
Summary 197
Chapter 9: Classification – Music Genre Classification 199
Sketching our roadmap 199
Fetching the music data 200
Converting into a WAV format 200
Looking at music 201
Decomposing music into sine wave components 203
Using FFT to build our first classifier 205
Increasing experimentation agility 205
Training the classifier 207
Using a confusion matrix to measure accuracy in
multiclass problems 207
Table of Contents
[ v ]
An alternative way to measure classifier performance
using receiver-operator characteristics 210
Improving classification performance with Mel
Frequency Cepstral Coefficients 214
Summary 218
Chapter 10: Computer Vision 219
Introducing image processing 219
Loading and displaying images 220
Thresholding 222
Gaussian blurring 223
Putting the center in focus 225
Basic image classification 228
Computing features from images 229
Writing your own features 230
Using features to find similar images 232
Classifying a harder dataset 234
Local feature representations 235
Summary 239
Chapter 11: Dimensionality Reduction 241
Sketching our roadmap 242
Selecting features 242
Detecting redundant features using filters 242
Correlation 243
Mutual information 246
Asking the model about the features using wrappers 251
Other feature selection methods 253
Feature extraction 254
About principal component analysis 254
Sketching PCA 255
Applying PCA 255
Limitations of PCA and how LDA can help 257
Multidimensional scaling 258
Summary 262
Chapter 12: Bigger Data 263
Learning about big data 264
Using jug to break up your pipeline into tasks 264
An introduction to tasks in jug 265
Looking under the hood 268
Using jug for data analysis 269
Reusing partial results 272
Table of Contents
[ vi ]
Using Amazon Web Services 274
Creating your first virtual machines 276
Installing Python packages on Amazon Linux 282
Running jug on our cloud machine 283
Automating the generation of clusters with StarCluster 284
Summary 288
Appendix: Where to Learn More Machine Learning 291
Online courses 291
Books 291
Question and answer sites 292
Blogs 292
Data sources 293
Getting competitive 293
All that was left out 293
Summary 294
Index 295