{"product_id":"9781638350361","title":"Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code","description":"\u003cstrong\u003eSummary\u003c\/strong\u003e \u003cbr\u003eModern data science solutions need to be clean, easy to read, and scalable. In \u003ci\u003eMastering Large Datasets with Python\u003c\/i\u003e, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You’ll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as distributed technologies that allow for high data throughput. The abundant hands-on exercises in this practical tutorial will lock in these essential skills for any large-scale data science project.\u003cbr\u003e \u003cbr\u003e \u003cbr\u003e \u003cbr\u003ePurchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.\u003cbr\u003e\u003cbr\u003e \u003cstrong\u003eAbout the technology\u003c\/strong\u003e \u003cbr\u003eProgramming techniques that work well on laptop-sized data can slow to a crawl—or fail altogether—when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change.\u003cbr\u003e\u003cbr\u003e \u003cstrong\u003eAbout the book\u003c\/strong\u003e \u003cbr\u003e\u003ci\u003eMastering Large Datasets with Python\u003c\/i\u003e teaches you to write code that can handle datasets of any size. You’ll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You’ll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you’ll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3.\u003cbr\u003e\u003cbr\u003e \u003cstrong\u003eWhat's inside\u003c\/strong\u003e \u003cbr\u003e \u003cul\u003e \u003cli\u003eAn introduction to the map and reduce paradigm\u003c\/li\u003e \u003cli\u003eParallelization with the multiprocessing module and pathos framework\u003c\/li\u003e \u003cli\u003eHadoop and Spark for distributed computing\u003c\/li\u003e \u003cli\u003eRunning AWS jobs to process large datasets\u003c\/li\u003e \u003c\/ul\u003e \u003cbr\u003e\u003cbr\u003e \u003cstrong\u003eAbout the reader\u003c\/strong\u003e \u003cbr\u003eFor Python programmers who need to work faster with more data.\u003cbr\u003e\u003cbr\u003e \u003cstrong\u003eAbout the author\u003c\/strong\u003e \u003cbr\u003e\u003cb\u003eJ. T. Wolohan\u003c\/b\u003e is a lead data scientist at Booz Allen Hamilton, and a PhD researcher at Indiana University, Bloomington.\u003cbr\u003e \u003cbr\u003e \u003cbr\u003e\u003cbr\u003eTable of Contents:\u003cbr\u003e \u003cbr\u003ePART 1\u003cbr\u003e \u003cbr\u003e1 ¦ Introduction\u003cbr\u003e \u003cbr\u003e2 ¦ Accelerating large dataset work: Map and parallel computing\u003cbr\u003e \u003cbr\u003e3 ¦ Function pipelines for mapping complex transformations\u003cbr\u003e \u003cbr\u003e4 ¦ Processing large datasets with lazy workflows\u003cbr\u003e \u003cbr\u003e5 ¦ Accumulation operations with reduce\u003cbr\u003e \u003cbr\u003e6 ¦ Speeding up map and reduce with advanced parallelization\u003cbr\u003e \u003cbr\u003ePART 2\u003cbr\u003e \u003cbr\u003e7 ¦ Processing truly big datasets with Hadoop and Spark\u003cbr\u003e \u003cbr\u003e8 ¦ Best practices for large data with Apache Streaming and mrjob\u003cbr\u003e \u003cbr\u003e9 ¦ PageRank with map and reduce in PySpark\u003cbr\u003e \u003cbr\u003e10 ¦ Faster decision-making with machine learning and PySpark\u003cbr\u003e \u003cbr\u003ePART 3\u003cbr\u003e \u003cbr\u003e11 ¦ Large datasets in the cloud with Amazon Web Services and S3\u003cbr\u003e \u003cbr\u003e12 ¦ MapReduce in the cloud with Amazon’s Elastic MapReduce","brand":"Manning","offers":[{"title":"Default Title","offer_id":46634642538737,"sku":"9781638350361","price":49.99,"currency_code":"USD","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0674\/5433\/7265\/files\/9781638350361_p0.jpg?v=1765422827","url":"https:\/\/shop.barnesandnoble.com\/products\/9781638350361","provider":"Barnes \u0026 Noble","version":"1.0","type":"link"}