The latest Google Gemini 2.0 is awesome, AI video

Google is really awesome, the latest upgraded multimodal model Gemini 2.0 has many capabilities, video calls, screen sharing. And it's all free, go and use it now. You can enter it by searching Google aistudio on Google, and you can use it for free if you have an account. The operation interface is shown in Figure 2 and Figure 3.

Let me talk about a few places that I personally think are awesome.

1. Pure multimodal video conversation. Click on the left side of stream realtime, and select show gemini in the right dialog box on the third floor, and you can directly open the camera and have a video conversation with it, which is probably very similar to the high-order voice mode of chatgpt. However, it can't speak Chinese for the time being. It can understand Chinese, but it doesn't speak. It can understand what's in your camera and understand all the visual images. For example, if I ask him to tell me carefully about what's behind me, he can tell me. Take out the kindle and put it in front of the camera for him to see. Looking at the text on the screen, he can also tell me the content of the book. It can also distinguish the brand of my watch. Haha.

2. Share the screen and become an AI partner directly. Click share your screen on the right dialog box to share your screen with it. You can choose to share only the browser window or the entire desktop. It's great. Then you can chat with it while you work. For example, let it help you look at the web page you are browsing, or let it look at the chat window between you and others, and ask it if its own chat method can be better (for example, you open the WeChat dialog box and let it be your chat advisor). What's more explosive is that you can watch YouTube videos with you and chat while watching. If you don't understand something, you can also ask it to explain it to you. It's a bit awesome. It's definitely a new era experience.

3. If you don't want to video, you can also have a normal voice conversation, similar to the high-level voice mode of chatgpt, just choose talk to gemini.

4. It has opened up the agent capabilities used by tools (automatically calling Google's tools), as well as long text, long table processing, and multiple breakthroughs in code capabilities, laying a solid foundation for the super AI assistant of the entire project astra. If you are interested, you can study it yourself. Google's update this time is definitely not inferior to OpenAI's 12-day marathon update. By the way, it also launched the raw video model veo2.

DeepMind's CEO Ohassabis Demis said that we are entering an agent-based era, a world based on intelligent agents. This time, Gemini 2.0 feels like it has revealed a corner of the future world to us. Find a way to experience it, it's even free.

The big guys are still the big guys. Google's recent series of AI applications are really getting better and better. NotebookLM, learn about, this time the underlying basic model Gemini has been updated to 2.0, various capabilities have made great progress, and its own applications and models are integrated very well (see notebookLM and learn about for details). It feels like the global field of AI has entered an era of openai, anthropic and google, and Google's stamina is getting stronger and stronger.

#AI #gemini #notebookLM #google #productivity

2024/12/18 Edited to

Google Gemini 2.0 showcases exciting features that are altering the landscape of artificial intelligence. This upgraded multimodal model not only allows seamless video calling but also enhances user interaction through screen sharing and real-time visual understanding. Users can initiate video conversations with Gemini by simply selecting options within its user-friendly interface, making engagement quick and intuitive. The AI's ability to process visual input is particularly noteworthy; it can identify written text and even distinguish products like watches, offering a more comprehensive experience. Such capabilities are reminiscent of advanced voice recognition technologies yet take user interaction to greater heights with their visual components. Additionally, the integration of screen sharing enables Gemini to act as a versatile assistant. Whether you're seeking advice on online browsing or need help with a chat conversation, this AI model offers suggestions in real-time. Enhanced features like watching and discussing YouTube videos simultaneously expand its practical applications. As underscored by Google’s continuous development focus, Gemini 2.0 promises to deliver increased functionality across various domains, comparable to major competitors in the AI field, reaffirming its position as a cutting-edge tool. With free access available through Google aistudio for users with an account, experiencing these advancements firsthand is more accessible than ever.