This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Beginner

I want to learn about data science and AI


Card image cap
Conferences, Training and Webcasts

Resources for getting started and learning about AI.

Card image cap
Use Cases

Explore various data science EPRI use cases to further here.

Card image cap
External Resources

Additional data science resources.

1 - Conferences, Training and Webcasts

Resources for learning and getting started

1.1 - Conferences

Conferences related to data science and A.I.


Conference Series

Predictive Analytics World The “premier machine learning conference.” Tracks in industry, deep learning, business, and more. Includes speakers, workshops, and networking opportunities.

Machine Learning and Data Mining (MLDM) Brings together researchers on machine learning and data mining. Conference includes talks, workshops, tutorials and exhibition.

AI & Big Data Expo Expo showcases next gen technologies and strategies from AI and big data. Tracks include Enterprise AI & Digital Transformation, Data Analytics for AI & IoT, Big Data Strategies and more.

1.2 - Training

Training material for data science



Learn R, Python & Data Science Online

Learn Data Science from the comfort of your browser, at your own pace with DataCamp’s video tutorials & coding challenges on R, Python, Statistics & more.

Explore Our Programs and Courses | Udacity Catalog

Get the latest tech skills to advance your career. Browse Nanodegree programs in AI, automated systems & robotics, data science, programming and business.

Explore Our Programs and Courses | Udacity Catalog

AI, Analytics, Data Science, and Machine Learning Courses, online and on-site.

1.3 - Videos

Videos on Data Science


Top 5 Videos on Data Science You Must Watch | Jigsaw Academy

Self-paced learning is the new trend and we must say, it’s really effective. With the evolution of technology, one of the benefits that have reached the common man is the exposure to educational content and information that will help him or her evolve as a person. With YouTube, this has gone to the …

Data Science for Beginners - 5 Questions Data Science Answers

Get a quick introduction to data science from Data Science for Beginners in five short videos. This video series is helpful if you’re interested in doing data science - or work with people who do data

Artificial Intelligence | What Is AI | Introduction to Artificial Intelligence | Edureka

( TensorFlow Training - https://www.edureka.co/ai-deep-learning-with-tensorflow )This video on Artificial Intelligence gives you a brief introduction to AI a….

Data Science for Beginners Video 2: Is Your Data Ready for Data Science?

Learn about evaluating your data to make sure it meets some basic criteria so that it’s ready for data science. This second video in the Data Science for Beginners series h…


Data Science Tutorial | Data Science for Beginners | Data Science With Python Tutorial | Simplilearn

This Data Science Tutorial will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used …

2 - Uses Cases

Evaluating distribution system reliability and resiliency investments


Case Study
How metadata can help

Metadata schema examples that help others at EPRI know what kind of data is available for them to use in research and how to search for this data.

Case Study
Customized data analytics process for PDU

This is a high level overview of the journey that PDU and its members are likely to follow for Data Analytics.

Case Study
New insights through visualization

Data visualizations that highlight new insights. This use case took dat from a predictor tool in Generation to discover new insights with visualizations.

Case Study
Engaging members with data science

This use case shared preliminary data science findings with members to demonstrate a different view of their metrics.

Find more information on how to Leverage Data Science Smartly here.

2.1 - How metadata can help

Metadata schema examples that help others at EPRI know what kind of data is available for them to use in research and how to search for this data.


2.2 - Customized data analytics process for PDU

This is a high level overview of the journey that PDU and its members are likely to follow for Data Analytics.

Unique to Electric Distribution & Utilization

Ingest-to-store

  • Decisions that consumers make have a huge impact in aggregate
  • The network of intelligent devices is growing with different protocols and data formats, where the same event is measured in different ways
  • At ingestion, it is critical not to throw out any data

Cleanse & Data Prep

  • Data cleanse and prep takes a huge investment in time to curate and normalize. Expect information gaps in the data records
  • Time series is not a snapshot in time. It changes by day, week, month, and year
  • Need to determine how sensors are collecting information: constant monitoring or event based monitoring
  • Given the massive amounts of data, start with a portfolio of prototype projects that can be scaled up

Visualize-to-Analyze

  • Visualize the data to make it easier to see patterns, identify important variables and develop a plan for modeling
  • Select manageable analytic data sets to use for data modeling
  • Data at different resolutions and sampling may be a useful approach when the data is sparse
  • Build several analytic models to help characterize and understand end-use load profiles and extract insights

Find more information here.

2.3 - New insights through visualization

Data visualizations that highlight new insights. This use case took dat from a predictor tool in Generation to discover new insights with visualizations.


Visualizations help to identify patterns in data to discover and understand stories.

Relationship between input and output

Target = FeS in Ash (% weight)

These charts show a mixed relationship with FeS in Ash (% weight). There may be two (or more) patterns.

Carbon is an important predictor in the model

  • Red plots are histograms.
  • Blue plots show the scatter relationship at this intersection in the grid. For example, in the bottom row, second column, the relationship between Fe2O3 and FeS in Ash is plotted.
  • The histograms for each of these parameters suggest that there might be two groups in each dimension instead of one, but the scatterplot shows a fairly strong correlation.

Cluster Comparison - Scale

Box plots show the median surrounded by the interquartile range (25th percentile to 75th percentile. The 2 clusters are well-separated.)

Dashboard example 1

Dashboard example 2

Learn more here.

2.4 - Engaging members with data science

This use case shared preliminary data science findings with members to demonstrate a different view of their metrics.


Opportunity to use data science share with members a different point of view of their data.

Multivariate analysis: Provide members with more sophisticated understanding of the benchmark data by applying multivariate analysis. Discover the interrelationships between multiple variables in the study.

Predictive analysis: Members are interested in the interrelationship of the sustainability efforts. There is an opportunity to understand the causation that may be correlated to sustainability variables in the study.

External data: Provide members with greater context for the metrics by incorporating external data to understand climate, demographics, GIS/mapping and plant operations data may affect sustainability results. For example, the EPA has emissions and water flow data at a high level of detail that can inform the member’s sustainability strategy.

Facilitate a workshop with members

Engaging members with data science

Member engagement: The TI project gave the Sustainability team an opportunity to start a dialogue with members about their hypothesis and the metrics that are of greatest importance to them. The workshop with members generated 12-15 hypothesis to explore.

Predictive variables: Through our analysis, we identified variables that might be better predictors of future performance and tagged them as valid and invalid variables.

SPSS Training: Provided Morgan Scott and a her analyst with training on using SPSS Modeler so that they can continue to explore her data without restrictions and pursue the issue of interests that the members identified during the workshop.

3 - External Resources

Additional data resources to supplement your data science needs.

3.1 - Data for Use in R&D

Additional data resources to supplement your data science needs.


Air Quality System (AQS) | US EPA

The Air Quality System (AQS) is EPA’s repository of ambient air quality data. AQS stores data from over 10,000 monitors, 5000 of which are currently active.


Atmospheric Deposition and Critical Loads Data

NADP Maps and Data

National Atmospheric Deposition Program

Share

EPA’s Critical Loads Mapper Tool enables access to information on atmospheric deposition, critical loads, and their exceedances to better understand vulnerability to atmospheric pollution.

CLAD Database Access

National Atmospheric Deposition Program

Emissions

National Emissions Inventory (NEI) | US EPA

Detailed estimate of air emissions of both criteria and hazardous air pollutants from all air emissions sources.

Toxics Release Inventory (TRI) Program | US EPA

The Toxics Release Inventory tracks the management of certain toxic chemicals that may pose a threat to human health and the environment.

Satellite Air Quality Measurements

https://earthdata.nasa.gov/earth-observation-data/near-real-time/hazards-and-disasters/air-quality

https://www.usgs.gov/centers/eros/science/national-land-cover-database

https://www.nodc.noaa.gov/ocads/oceans/

https://www.ncdc.noaa.gov/cdo-web/datatools/lcd

Meteorology

https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/automated-surface-observing-system-asos

https://www.nrel.gov/analysis/jedi/index.html

Using JEDI, you can analyze the energy impacts of wind, biofuels, concentrating solar power, geothermal, marine and hydrokinetic power, coal, and natural gas power plants.

https://openei.org/wiki/Utility_Rate_Database

Rate structure information from utilities in the U.S. maintained by the U.S. Department of Energy

3.2 - Tools, Tips & Cheat Sheets

Additional data resources to supplement your data science needs.

Tools to Help you Get Started

A. 3 Major Steps to Data Science Project

B. Data Analytics Pyramid

C. Data Science Project Scorecard

D. 10 Questions to Understand Your Data

E. Data Analytics Lifecycle with Questionnaire

F. Data Analytics Planning Template

3 Major Steps to a Data Science Project



  1. Planning

    • Pose your operational problem as a data question
    • Make sure you have a good problem to solve with data science
    • Determine where your project falls in the lifecycle
  2. Execution

    • Stage the data
    • Select an analytic approach
    • Do the analytic work
    • Hand off the patterns to Action Team
    • Communicate the benefits to stakeholders
  3. Action Team

    • Share insights with your Action Team
    • Action Team will use them to implement change

Data Analytics Pyramid

Data Science Project Score

10 Questions to Understand Your Data

Data Analytics Lifecycle with Questionnaire

Data Analytics Planning Template