基本信息
源码名称:六步使用Python构建机器学习系统(Building Machine Learning Systems with Python_ Sec.pdf)
源码大小:6.90M
文件格式:.pdf
开发语言:Python
更新时间:2020-02-27
友情提示:(无需注册或充值,赞助后即可获取资源下载链接)
嘿,亲!知识可是无价之宝呢,但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下,绝对物超所值哦!如有下载和支付问题,请联系我们QQ(微信同号):813200300
本次赞助数额为: 1 元×
微信扫码支付:1 元
×
请留下您的邮箱,我们将在2小时内将文件发到您的邮箱
源码介绍
Building Machine Learning Systems with Python_ Sec.pdf
Table of Contents Preface vii Chapter 1: Getting Started with Python Machine Learning 1 Machine learning and Python – a dream team 2 What the book will teach you (and what it will not) 3 What to do when you are stuck 4 Getting started 5 Introduction to NumPy, SciPy, and matplotlib 6 Installing Python 6 Chewing data efficiently with NumPy and intelligently with SciPy 6 Learning NumPy 7 Indexing 9 Handling nonexisting values 10 Comparing the runtime 11 Learning SciPy 12 Our first (tiny) application of machine learning 13 Reading in the data 14 Preprocessing and cleaning the data 15 Choosing the right model and learning algorithm 17 Before building our first model… 18 Starting with a simple straight line 18 Towards some advanced stuff 20 Stepping back to go forward – another look at our data 22 Training and testing 26 Answering our initial question 27 Summary 28 Chapter 2: Classifying with Real-world Examples 29 The Iris dataset 30 Visualization is a good first step 30 Building our first classification model 32 Evaluation – holding out data and cross-validation 36 www.allitebooks.com Table of Contents [ ii ] Building more complex classifiers 39 A more complex dataset and a more complex classifier 41 Learning about the Seeds dataset 41 Features and feature engineering 42 Nearest neighbor classification 43 Classifying with scikit-learn 43 Looking at the decision boundaries 45 Binary and multiclass classification 47 Summary 49 Chapter 3: Clustering – Finding Related Posts 51 Measuring the relatedness of posts 52 How not to do it 52 How to do it 53 Preprocessing – similarity measured as a similar number of common words 54 Converting raw text into a bag of words 54 Counting words 55 Normalizing word count vectors 58 Removing less important words 59 Stemming 60 Stop words on steroids 63 Our achievements and goals 65 Clustering 66 K-means 66 Getting test data to evaluate our ideas on 70 Clustering posts 72 Solving our initial challenge 73 Another look at noise 75 Tweaking the parameters 76 Summary 77 Chapter 4: Topic Modeling 79 Latent Dirichlet allocation 80 Building a topic model 81 Comparing documents by topics 86 Modeling the whole of Wikipedia 89 Choosing the number of topics 92 Summary 94 Chapter 5: Classification – Detecting Poor Answers 95 Sketching our roadmap 96 Learning to classify classy answers 96 Tuning the instance 96 Table of Contents [ iii ] Tuning the classifier 96 Fetching the data 97 Slimming the data down to chewable chunks 98 Preselection and processing of attributes 98 Defining what is a good answer 100 Creating our first classifier 100 Starting with kNN 100 Engineering the features 101 Training the classifier 103 Measuring the classifier's performance 103 Designing more features 104 Deciding how to improve 107 Bias-variance and their tradeoff 108 Fixing high bias 108 Fixing high variance 109 High bias or low bias 109 Using logistic regression 112 A bit of math with a small example 112 Applying logistic regression to our post classification problem 114 Looking behind accuracy – precision and recall 116 Slimming the classifier 120 Ship it! 121 Summary 121 Chapter 6: Classification II – Sentiment Analysis 123 Sketching our roadmap 123 Fetching the Twitter data 124 Introducing the Naïve Bayes classifier 124 Getting to know the Bayes' theorem 125 Being naïve 126 Using Naïve Bayes to classify 127 Accounting for unseen words and other oddities 131 Accounting for arithmetic underflows 132 Creating our first classifier and tuning it 134 Solving an easy problem first 135 Using all classes 138 Tuning the classifier's parameters 141 Cleaning tweets 146 Taking the word types into account 148 Determining the word types 148 Successfully cheating using SentiWordNet 150 Table of Contents [ iv ] Our first estimator 152 Putting everything together 155 Summary 156 Chapter 7: Regression 157 Predicting house prices with regression 157 Multidimensional regression 161 Cross-validation for regression 162 Penalized or regularized regression 163 L1 and L2 penalties 164 Using Lasso or ElasticNet in scikit-learn 165 Visualizing the Lasso path 166 P-greater-than-N scenarios 167 An example based on text documents 168 Setting hyperparameters in a principled way 170 Summary 174 Chapter 8: Recommendations 175 Rating predictions and recommendations 175 Splitting into training and testing 177 Normalizing the training data 178 A neighborhood approach to recommendations 180 A regression approach to recommendations 184 Combining multiple methods 186 Basket analysis 188 Obtaining useful predictions 190 Analyzing supermarket shopping baskets 190 Association rule mining 194 More advanced basket analysis 196 Summary 197 Chapter 9: Classification – Music Genre Classification 199 Sketching our roadmap 199 Fetching the music data 200 Converting into a WAV format 200 Looking at music 201 Decomposing music into sine wave components 203 Using FFT to build our first classifier 205 Increasing experimentation agility 205 Training the classifier 207 Using a confusion matrix to measure accuracy in multiclass problems 207 Table of Contents [ v ] An alternative way to measure classifier performance using receiver-operator characteristics 210 Improving classification performance with Mel Frequency Cepstral Coefficients 214 Summary 218 Chapter 10: Computer Vision 219 Introducing image processing 219 Loading and displaying images 220 Thresholding 222 Gaussian blurring 223 Putting the center in focus 225 Basic image classification 228 Computing features from images 229 Writing your own features 230 Using features to find similar images 232 Classifying a harder dataset 234 Local feature representations 235 Summary 239 Chapter 11: Dimensionality Reduction 241 Sketching our roadmap 242 Selecting features 242 Detecting redundant features using filters 242 Correlation 243 Mutual information 246 Asking the model about the features using wrappers 251 Other feature selection methods 253 Feature extraction 254 About principal component analysis 254 Sketching PCA 255 Applying PCA 255 Limitations of PCA and how LDA can help 257 Multidimensional scaling 258 Summary 262 Chapter 12: Bigger Data 263 Learning about big data 264 Using jug to break up your pipeline into tasks 264 An introduction to tasks in jug 265 Looking under the hood 268 Using jug for data analysis 269 Reusing partial results 272 Table of Contents [ vi ] Using Amazon Web Services 274 Creating your first virtual machines 276 Installing Python packages on Amazon Linux 282 Running jug on our cloud machine 283 Automating the generation of clusters with StarCluster 284 Summary 288 Appendix: Where to Learn More Machine Learning 291 Online courses 291 Books 291 Question and answer sites 292 Blogs 292 Data sources 293 Getting competitive 293 All that was left out 293 Summary 294 Index 295