On the second day of the re:Invent, AWS CEO Andy Jassy announced severals new products, here is a summary and my thoughts on the products and how they are going to change the business:
This year they focus heavily on advertising the Niro System, the underlying platform that powers EC2 and several other services. With this system they said they can deliver instances faster to the customers at lower costs.
They also introduce some new instance types, including Inf1, powered by their own Inferential chips to address Machine Learning use cases. To be honest, I was very amazed, the market for CPUs has been very lively these days, Intel and AMD trading blows on the x86 market, every phone manufacturer (Apple, Samsung, Huawei) has been trying to manufacture their own CPUs. And now, there is a very new market, CPUs that are made for specific use cases, like the Amazon Inferential, optimized for that use case only instead of general purposes computing. This should be a competition that is very entertaining to watch.
A very big announcement: Amazon EKS on AWS Fargate.
With already a lot of customers deploying their Kubernetes instances on AWS, now AWS has decided to equip their Elastic Kubernetes Service with Fargate. Since Fargate is a way people can manage containers instances without worrying about provisioning or managing servers, I guess a lot of customers would be very happy to use the service.
But I wouldn’t want to see AWS going completely monopoly on the container scene, I really hope competitors can catch up soon.
Lots of enhancements have been made to AWS databases offerings: Amazon S3 Access Points, Data Lake Export Query, Amazon Redshift RA3 Instances with Managed Storage, Advanced Query Accelerator for Amazon Redshift, UltraWarm for ElasticSearch. But what catched my attention the most was Amazon Managed Apache Cassandra Service.
A lot of companies like ourselves have been using Cassandra for business since it offers a very fast insertion speed compared to the other databases. With no equivalent on AWS, we kind of have to use DynamoDB instead, which results in two different data modeling, a chore to maintain. We would be very happy to see this new service released and with a reasonable pricing too, so that we can reduce the time it takes to migrate our on premises software to the cloud.
Machine learning is heavily focused on the event this year, and AWS didn’t fail to impress.
The first “wow” is the Amazon SageMaker Studio. It is a web-based IDE for a complete machine learning workflow. Build, train, and deploy machine learning models can now be done in a single interface. Along with that several other services like SageMaker Notebooks, Experiments, Debugger, Model Monitor, Autopilot.
And they managed to surprise me even further with Amazon CodeGuru, a service that review codes not only in the sense of linting, finding syntactic errors, but also finding logical errors, expensive operations, concurrency issues, etc. If the thing can be just half as good as they said it was, it definitely would become a game changer.
One more interesting announcement is Kendra, an internal enterprise search engine. This sounds like the direct competitor to our business Neuron. This led me to thinking, AWS is not an infrastructure company anymore, just like their parent company Amazon, they participate in every market now. What I wonder is, their motives on this move. Is AWS going to offer very domain specific applications and destroy smaller competitors completely, or they are going to build an ecosystem, where smaller businesses will use their application as the core, and offers additional services to the customers to make a profit. One thing I know for sure, our business Neuron will have to improve, significantly so that we can keep on being one of the best offerings on the market.
My only complain is, they should let the rock band perform a little more, instead of just verses of songs. I felt very bad for the band.
-- Cuong Nguyen, Software developer @ Brains Technology, Inc.
Building ML practices to address the top four use cases
I woke up very early in the morning (as early as 2pm) so I came very early to the venue too. The room of the session is shared with other sessions, so there’s no loud speaker, just some headphones set up for people who couldn’t hear the speaker directly.
This session is kind of an introductory to the ML stack offerings from AWS, though it didn’t provide much detail about how to implement things, the speakers did give you a brief idea of steps you need to take to address your machine learning problems (4 of which mentioned in this session: Recommendation engines, Forecasting, Computer vision, and Natural language processing).
What I learned from this session is: as an AWS user, you should first take advantage of their very convenient offerings like Personalize, Forecast, Rekonition, Textract, etc. to address your problem. These options are very convenient because you don’t really need the data initially to create your first models. It’s better to have a mediocre model than nothing, so if your system is just at an initial state, those are very helpful.
And then when you have your own data, you can either input them to the “easy buttons” solutions, or you can customize the process further using SageMaker, there will be lower level options for you to choose from: several algorithms which you can tweak to your liking. And there is a marketplace for pre built models for different use cases that you can browse and buy them to use if it satisfies you.
And if every pre built solutions cannot satisfy your needs, you can go deep down to the programming levels, which you can use frameworks like Tensorflow, Pytorch, and interfaces like Keras, or Amazon’s Gluon to program your ML pipeline yourself.
Full event can be found here.
(Update, at the Keynote one day later, AWS just announced some services that can make the process much more convenient including a cloud based IDE for SageMaker, with multiple SageMaker related goodies)
Advanced Design Patterns for DynamoDB
This is a repeated session from 2018, but there were still a lot of them being held repeatedly this year, and with a lot of people attending those sessions too. I can understand the reason behind that. Everybody is using NoSQL for their workloads, but not very many of them (myself included) truly understand what it is and how to utilize the technology.
Speaker Rick Houlihan started the presentation by addressing the reason why NoSQL is taking over RDBMS: data gets bigger, storage gets cheaper and CPUs are still expensive, it is more appropriate to store unnormalized that is ready for fast querying than to store normalized data that needs heavy computing for queries.
NoSQL is a very different beast than traditional relational databases. Using NoSQL the same way as relational databases is wrong, and if you use the same data model it’s wrong. In this session, Mr. Houlihan introduced briefly about the data model of DynamoDB, and then some advanced design patterns such as:
Choosing partition key to optimize partitioning performance