This provides our data scientist a one-click method of getting from their algorithms to production. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. That requires serving layer that is robust, agile, flexible, and allows for self-service. We have dozens of data products actively integrated systems. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see ).Īt Stitch Fix, algorithmic integrations are pervasive across the business. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.īeyond data movement and ETL, most #ML centric jobs (e.g. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. Apache Spark on Yarn is our tool of choice for data movement and #ETL. We store data in an Amazon S3 based data warehouse. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable. One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward.Īt the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic. Search: Elasticsearch / Amazon Elasticsearch Service / AlgoliaĪs our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value. Once again, here you need a managed service your cloud provider handles for you.įuture improvements / technology decisions included: Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. On the database side: Amazon RDS / MySQL initially. Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance.Ĭombined with Docker so our application would run within its own container, independently from the underlying host configuration. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.įor the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. I had inherited years and years of technical debt and I knew things had to change radically. The company also does provide Data APIs to Enterprise customers. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email. Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |