There are many use cases where your data is bigger than a single machine and you may not have experience with Hadoop, MapReduce, Spark or others. This talk explores approaches for dealing with “medium” and “large” sized datasets from a data scientist/data analyst perspective, or whoever is doing the analysis. We offer some practical considerations related to working with large data sets in R, and details some cloud technologies for big data that you can connect with R, like Spark and sparklyr (which is easy to use in the cloud with Azure Databricks).
About the Speaker:
Marck is a Technical Specialist at Microsoft and helps U.S. Federal Government customers adopt the Azure platform and use it for Data Science, Big Data, Advanced Analytics and Artificial Intelligence workloads. His expertise lies in making data work for the problem at hand, drawing from experience in multiple industries including Internet, telecommunications, and high tech. Marck is an experienced R programmer and advocate. He co-founded Data Community DC, an organization that promotes Data Science and Analytics practitioners in the Washington D.C. area. He teaches graduate level courses at Georgetown University and the George Washington University. Marck grew up in Caracas, Venezuela, speaks fluent Spanish, and lived in South Florida in the past.