
| 
 | |||||||||||||||||||||||||||
Certkingdom's preparation material includes the most excellent features, prepared by the same dedicated experts who have come together to offer an integrated solution. We provide the most excellent and simple method to pass your certification exams on the first attempt "GUARANTEED"
              
              Whether you want to improve your skills, expertise or career growth, with Certkingdom's training and certification resources help you achieve your goals. Our exams files feature hands-on tasks and real-world scenarios; in just a matter of days, you'll be more productive and embracing new technology standards. Our online resources and events enable you to focus on learning just what you want on your timeframe. You get access to every exams files and there continuously update our study materials; these exam updates are supplied free of charge to our valued customers. Get the best 70-775 exam Training; as you study from our exam-files "Best Materials Great Results"
                 
              70-775 Exam + Online / Offline and Android Testing Engine & 4500+ other exams included
        $50 - $25 (you save $25)
              Buy Now
              
70-775
  Perform Data Engineering on Microsoft Azure HDInsight
  
  Published: February 22, 2017
  Languages: English
  Audiences: Data scientists
  Technology: Azure HDInsight
  Credit toward certification: MCSE 
  
  Skills measured
  This exam measures your ability to accomplish the technical tasks listed   below. View video tutorials about the variety of question types on Microsoft   exams.
  
  Please note that the questions may test on, but will not be limited to, the   topics described in the bulleted text.
  
  Do you have feedback about the relevance of the skills measured on this exam?   Please send Microsoft your comments. All feedback will be reviewed and   incorporated as appropriate while still maintaining the validity and reliability   of the certification process. Note that Microsoft will not respond directly to   your feedback. We appreciate your input in ensuring the quality of the Microsoft   Certification program.
  
  If you have concerns about specific questions on this exam, please submit an   exam challenge.
  
  If you have other questions or feedback about Microsoft Certification exams or   about the certification program, registration, or promotions, please contact   your Regional Service Center.
  
  Administer and Provision HDInsight Clusters
  Deploy HDInsight clusters
  Create a cluster in a private virtual network, create a cluster that has a   custom metastore, create a domain-joined cluster, select an appropriate cluster   type based on workload considerations, customize a cluster by using script   actions, provision a cluster by using Portal, provision a cluster by using Azure   CLI tools, provision a cluster by using Azure Resource Manager (ARM) templates   and PowerShell, manage managed disks, configure vNet peering
  Deploy and secure multi-user HDInsight clusters
  Provision users who have different roles; manage users, groups, and permissions   through Apache Ambari, PowerShell, and Apache Ranger; configure Kerberos;   configure service accounts; implement SSH tunneling; restrict access to data
  Ingest data for batch and interactive processing
  Ingest data from cloud or on-premises data; store data in Azure Data Lake; store   data in Azure Blob Storage; perform routine small writes on a continuous basis   using Azure CLI tools; ingest data in Apache Hive and Apache Spark by using   Apache Sqoop, Application Development Framework (ADF), AzCopy, and AdlCopy;   ingest data from an on-premises Hadoop cluster
  Configure HDInsight clusters
  Manage metastore upgrades; view and edit Ambari configuration groups; view and   change service configurations through Ambari; access logs written to Azure Table   storage; enable heap dumps for Hadoop services; manage HDInsight configuration,   use HDInsight .NET SDK, and PowerShell; perform cluster-level debugging; stop   and start services through Ambari; manage Ambari alerts and metrics
  Manage and debug HDInsight jobs
  Describe YARN architecture and operation; examine YARN jobs through   ResourceManager UI and review running applications; use YARN CLI to kill jobs;   find logs for different types of jobs; debug Hadoop and Spark jobs; use Azure   Operations Management Suite (OMS) to monitor and manage alerts, and perform   predictive actions
  
  Implement Big Data Batch Processing Solutions
  Implement batch solutions with Hive and Apache Pig
  Define external Hive tables; load data into a Hive table; use partitioning and   bucketing to improve Hive performance; use semi-structured files such as XML and   JSON with Hive; join tables with Hive using shuffle joins and broadcast joins;   invoke Hive UDFs with Java and Python; design scripts with Pig; identify query   bottlenecks using the Hive query graph; identify the appropriate storage format,   such as Apache Parquet, ORC, Text, and JSON
  Design batch ETL solutions for big data with Spark
  Share resources between Spark applications using YARN queues and preemption,   select Spark executor and driver settings for optimal performance, use   partitioning and bucketing to improve Spark performance, connect to external   Spark data sources, incorporate custom Python and Scala code in a Spark DataSets   program, identify query bottlenecks using the Spark SQL query graph
  Operationalize Hadoop and Spark
  Create and customize a cluster by using ADF; attach storage to a cluster and run   an ADF activity; choose between bring-your-own and on-demand clusters; use   Apache Oozie with HDInsight; choose between Oozie and ADF; share metastore and   storage accounts between a Hive cluster and a Spark cluster to enable the same   table across the cluster types; select an appropriate storage type for a data   pipeline, such as Blob storage, Azure Data Lake, and local Hadoop Distributed   File System (HDFS)
  
  Implement Big Data Interactive Processing Solutions
  Implement interactive queries for big data with Spark SQL
  Execute queries using Spark SQL, cache Spark DataFrames for iterative queries,   save Spark DataFrames as Parquet files, connect BI tools to Spark clusters,   optimize join types such as broadcast versus merge joins, manage Spark Thrift   server and change the YARN resources allocation, identify use cases for   different storage types for interactive queries 
  Perform exploratory data analysis by using Spark SQL
  Use Jupyter and Apache Zeppelin for visualization and developing tidy Spark   DataFrames for modeling, use Spark SQL’s two-table joins to merge DataFrames and   cache results, save tidied Spark DataFrames to performant format for reading and   analysis (Apache Parquet), manage interactive Livy sessions and their resources  
  Implement interactive queries for big data with Interactive Hive
  Enable Hive LLAP through Hive settings, manage and configure memory allocation   for Hive LLAP jobs, connect BI tools to Interactive Hive clusters
  Perform exploratory data analysis by using Hive
  Perform interactive querying and visualization, use Ambari Views, use HiveQL,   parse CSV files with Hive, use ORC versus Text for caching, use internal and   external tables in Hive, use Zeppelin to visualize data
  Perform interactive processing by using Apache Phoenix on HBase
  Use Phoenix in HDInsight; use Phoenix Grammar for queries; configure   transactions, user-defined functions, and secondary indexes; identify and   optimize Phoenix performance; select between Hive, Spark, and Phoenix on HBase   for interactive processing; identify when to share metastore between a Hive   cluster and a Spark cluster
  
  Implement Big Data Real-Time Processing Solutions
  Create Spark streaming applications using DStream API
  Define DStreams and compare them to Resilient Distributed Dataset (RDDs), start   and stop streaming applications, transform DStream (flatMap, reduceByKey,   UpdateStateByKey), persist long-term data stores in HBase and SQL, persist Long   Term Data Azure Data Lake and Azure Blob Storage, stream data from Apache Kafka   or Event Hub, visualize streaming data in a PowerBI real-time dashboard
  Create Spark structured streaming applications
  Use DataFrames and DataSets APIs to create streaming DataFrames and Datasets;   create Window Operations on Event Time; define Window Transformations for   Stateful and Stateless Operations; stream Window Functions, Reduce by Key, and   Window to Summarize Streaming Data; persist Long Term Data HBase and SQL;   persist Long Term Data Azure Data Lake and Azure Blob Storage; stream data from   Kafka or Event Hub; visualize streaming data in a PowerBI real-time dashboard
  Develop big data real-time processing solutions with Apache Storm
  Create Storm clusters for real-time jobs, persist Long Term Data HBase and SQL,   persist Long Term Data Azure Data Lake and Azure Blob Storage, stream data from   Kafka or Event Hub, configure event windows in Storm, visualize streaming data   in a PowerBI real-time dashboard, define Storm topologies and describe Storm   Computation Graph Architecture, create Storm streams and conduct streaming   joins, run Storm topologies in local mode for testing, configure Storm   applications (Workers, Debug mode), conduct Stream groupings to broadcast tuples   across components, debug and monitor Storm jobs
  Build solutions that use Kafka
  Create Spark and Storm clusters in the virtual network, manage partitions,   configure MirrorMaker, start and stop services through Ambari, manage topics
  Build solutions that use HBase
  Identify HBase use cases in HDInsight, use HBase Shell to create updates and   drop HBase tables, monitor an HBase cluster, optimize the performance of an   HBase cluster, identify uses cases for using Phoenix for analytics of real-time   data, implement replication in HBase
