Data Science Training by Experts
Our Training Process
Data Science - Syllabus, Fees & Duration
MODULE 1
- The Data Science Process
- Apply the CRISP-DM process to business applications
- Wrangle, explore, and analyze a dataset
- Apply machine learning for prediction
- Apply statistics for descriptive and inferential understanding
- Draw conclusions that motivate others to act on your results
MODULE 2
- Communicating with Stakeholders
- Implement best practices in sharing your code and written summaries
- Learn what makes a great data science blog
- Learn how to create your ideas with the data science community
MODULE 3
- Software Engineering Practices
- Write clean, modular, and well-documented code
- Refactor code for efficiency
- Create unit tests to test programs
- Write useful programs in multiple scripts
- Track actions and results of processes with logging
- Conduct and receive code reviews
MODULE 4
- Object Oriented Programming
- Understand when to use object oriented programming
- Build and use classes
- Understand magic methods
- Write programs that include multiple classes, and follow good code structure
- Learn how large, modular Python packages, such as pandas and scikit-learn, use object oriented programming
- Portfolio Exercise: Build your own Python package
MODULE 5
- Web Development
- Learn about the components of a web app
- Build a web application that uses Flask, Plotly, and the Bootstrap framework
- Portfolio Exercise: Build a data dashboard using a dataset of your choice and deploy it to a web application
MODULE 6
- ETL Pipelines
- Understand what ETL pipelines are
- Access and combine data from CSV, JSON, logs, APIs, and databases
- Standardize encodings and columns
- Normalize data and create dummy variables
- Handle outliers, missing values, and duplicated data
- Engineer new features by running calculations • Build a SQLite database to store cleaned data
MODULE 7
- Natural Language Processing
- Prepare text data for analysis with tokenization, lemmatization, and removing stop words
- Use scikit-learn to transform and vectorize text data
- Build features with bag of words and tf-idf
- Extract features with tools such as named entity recognition and part of speech tagging
- Build an NLP model to perform sentiment analysis
MODULE 8
- Machine Learning Pipelines
- Understand the advantages of using machine learning pipelines to streamline the data preparation and modeling process
- Chain data transformations and an estimator with scikit learns Pipeline
- Use feature unions to perform steps in parallel and create more complex workflows
- Grid search over pipeline to optimize parameters for entire workflow
- Complete a case study to build a full machine learning pipeline that prepares data and creates a model for a dataset
MODULE 9
- Experiment Design
- Understand how to set up an experiment, and the ideas associated with experiments vs. observational studies
- Defining control and test conditions
- Choosing control and testing groups
MODULE 10
- Statistical Concerns of Experimentation
- Applications of statistics in the real world
- Establishing key metrics
- SMART experiments: Specific, Measurable, Actionable, Realistic, Timely
MODULE 11
- A/B Testing
- How it works and its limitations
- Sources of Bias: Novelty and Recency Effects
- Multiple Comparison Techniques (FDR, Bonferroni, Tukey)
- Portfolio Exercise: Using a technical screener from Starbucks to analyze the results of an experiment and write up your findings
MODULE 12
- Introduction to Recommendation Engines
- Distinguish between common techniques for creating recommendation engines including knowledge based, content based, and collaborative filtering based methods.
- Implement each of these techniques in python.
- List business goals associated with recommendation engines, and be able to recognize which of these goals are most easily met with existing recommendation techniques.
MODULE 13
- Matrix Factorization for Recommendations
- Understand the pitfalls of traditional methods and pitfalls of measuring the influence of recommendation engines under traditional regression and classification techniques.
- Create recommendation engines using matrix factorization and FunkSVD
- Interpret the results of matrix factorization to better understand latent features of customer data
- Determine common pitfalls of recommendation engines like the cold start problem and difficulties associated with usual tactics for assessing the effectiveness of recommendation engines using usual techniques, and potential solutions.
This syllabus is not final and can be customized as per needs/updates