Generate videos in Gemini and Whisk with Veo 2

(blog.google)

296 points | by meetpateltech 15 hours ago

21 comments

minimaxir 14 hours ago
Whisk itself (https://labs.google/fx/tools/whisk) was released a few months ago under the radar as a demo for Imagen 3 and it's actually fun to play with and surprisingly robust given its particular implementation.
It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.
[-]
- torginus 13 hours ago
  Why text? why not encode the image into some latent space representation, so that it can survive a round-trip more or less faithfully?
  [-]
  - minimaxir 13 hours ago
    Because Imagen 3 is a text-to-image model, not an image-to-image model, so the inputs have to be some form of text. Multimodal models such as 4o image generation or Gemini 2.0 which can take in both text and image inputs do encode image inputs to a latent space through a Vision Transformer, but not reverseable or losslessly.
  - Uehreka 8 hours ago
    There’s a thing called CLIP Vision that sort of does that, but it converts the image into conditioning space (the same space as the embeddings from a text prompt). I’d say it works… OK.
  - doctorpangloss 9 hours ago
    They don't want you to modify images you supply yourself.
  - flkenosad 12 hours ago
    Text might honestly be the best latent space representation.
- cubefox 13 hours ago
  > This tool isn’t available in your country yet
  > Enter your email to be notified when it becomes available
  (Submit)
  > We can't collect your emails at the moment
  [-]
  - fragmede 1 hour ago
    GDPR ftw!
    [-]
    - patates 1 hour ago
      I'm not a lawyer but I thought GDPR didn't prevent that. It adds a lot of restrictions on how they can use those emails for how long, but not a complete ban on explicit sharing of emails.
      [-]
      - fragmede 1 hour ago
        If you read it very carefully, and then behave very carefully, you can comply with the law. Orrrrrr you can just not bother for your first pass, simply block the EU for now, and release it for them after you go and clean it up later.
Palmik 55 minutes ago
There's also Google Vids, also using Veo 2 under the hood. Product confusion :) https://workspace.google.com/products/vids/
pdntspa 2 hours ago
I burned through $48 in GCP credit making 12x 8-second videos in Veo2. Beware...
delichon 15 hours ago
I think I would buy "yes" shares in a Polymarket event that predicts a motion picture created by a single person grossing more than $100M by 2027.
[-]
- tracerbulletx 14 hours ago
  Everyone keeps ignoring supply and demand when talking about the impacts of AI. Let's just assume it really gets so good you can do this and it doesn't suck.
  Yes the costs will get so low that there will be almost no barrier to making content but if there is no barrier to making content, the ROI will be massive, and so everyone will be doing it, you can more or less have the exact movie you want in your head on demand, and even if you want a bespoke movie from an artist with great taste and a point of view there will be 10,000 of them every year.
  [-]
  - motoxpro 13 hours ago
    Totally agree.
    This is what Instagram and YouTube did and we got MrBeast and Kylie Jenner making billions of dollars. The cost of creating content is tapping record on your phone and the traditional "quality" as defined by visuals doesn't matter (see Quibi). Viral videos are selfies recorded in the bedroom.
    When you lower the barrier to entry things get more heterogeneous, not less. So you have bigger outcomes, not smaller, because the playing field expands. TikTok's inside was built on surfacing the 1 good video from a pool of 10s of millions. The platforms that surface the best content will be even more important.
    It's a little disheartening, I think, for people to think that the only reason they can't be creative is money, time, or technical skill, but in reality, it's just that they aren't that creative.
    So yes, everyone can create content in a world of AI, but not everyone is a good content creator/director/artist (or has the vision), same as it is now.
    [-]
    - SirMaster 12 hours ago
      Will the AI itself never be a good content creator/director/artist?
      People are always out there tying to convince others that AI is better than humans at X. How close is it to being better than humans at being a content creator itself? Or how long before that threshold is crossed?
      [-]
      - Workaccount2 12 hours ago
        It will always be subjective. There will always be holdouts who will denounce any AI work as "bad" simply because it was created by AI.
        Even when AI is objectively better and dominates in blind ratings tests, there will still be a strong market for "authentic" media.
        For instance we already have factories that churn out wares that are cheaper, stronger, better looking, and longer lasting than "hand made", yet people still seek out malformed $60 coffee mugs from the local artistan section in country shops.
    - darepublic 12 hours ago
      I don't think Mr Beast is particularly creative. He makes common denominator crap that appeals to kids. I expect the same of Kylie Jenner
      [-]
      - wongarsu 8 hours ago
        Meanwhile the cost of his videos is insanely high. The "insane" price money is the smallest part of it. He has insane sets he uses for only one or a small number of videos, he has a giant staff, high quality gear and many of his videos include either challenges going on over very long timespans or involving a high number of participants, making the logistics, recording and editing of those videos challenging and time intensive. Most TV shows could only dream of doing what he does.
        He started out simple, pointing a phone camera at himself counting really high, but his current channel is not a great example of a low barrier to entry. He explicitly sets himself apart by doing what other youtube creators or TV shows simply can't do
      - motoxpro 11 hours ago
        You may not like them, as another poster said, it's all subjective.
        That doesn't mean they aren't incredibly good at what they do and that millions (billions) of people have tried to do what they have and failed.
        One of the reasons it's "common denominator crap" is because the blob of the internet has 100s of millions of videos copying MrBeast and the Jenner/Kardasians created an entire generation of people that wanted to be influencers. Most of the copies are Slop.
        Once they are intrenched they can continue to produce "crap" as you call it because they have distribution, the copies don't work because they aren't novel, which makes people feel like it doesn't take talent and is the algorithms fault, until the next person to be "creative" gets distribution and the cycle repeats.
        There is just a lot less creativity than people imagine. It's not a right that we all have as humans; it's rare. 8.2 billion people on earth, 365 days in a year, 3 trillion shots on goal, and only a few hundred novel discoveries, art creations, companies, and ideas come from it.
    - jayd16 6 hours ago
      No single piece of content grossed 100m though. It just allowed for more low investment content at a higher rate, while the popularity of the site pushed them to celebrity status.
  - yorwba 14 hours ago
    And one of those 10,000 will have a multimillion marketing budget and people are talking about it online and remixing it into memes and it will make a lot more money than the second-most popular movie, even though there's no discernable quality difference.
    [-]
    - Asraelite 14 hours ago
      It will basically be like the rise of indie games. Every now and again you get something like Among Us which is low quality but good enough to be enjoyable and with the right combination of luck and timing it becomes insanely popular.
      [-]
      - Wowfunhappy 13 hours ago
        Not just Among Us. You also get Minecraft!
  - bufferoverflow 5 hours ago
    Unless something radically changes, we're quite far from creating movies on demand. Most AI video generators cost ~$1-10 per minute. And generally it takes many attempts to generate a few seconds of anything that's not completely trash.
    Another issue is quality. Most of these AI generators output quite blurry 720p. If you want proper 4K output, we're at least a couple of doublings away.
    I think we will have some decent AI-generated animations next year, because 2D cartoons are relatively easy to upscale.
    [-]
    - eMPee584 4 minutes ago
      there's a blender mcp addon available for folks who already have a clue how to make use of that..
  - GloamingNiblets 13 hours ago
    A good parallel is writing books. Books can cost little to write and publish, but their success is Pareto distributed, not Normally distributed.
    [-]
    - mlboss 10 hours ago
      Writing book is really expensive. You have to think and put words on paper that engages the read. It is really hard.
  - barrenko 14 hours ago
    To quote Nikita Bier, never underestimate how many people just want to watch Netflix and die.
    [-]
    - greesil 4 hours ago
      Am I the only one who can't stand Netflix's deluge of content? They occasionally had something good, but it's like once every two years.
      [-]
      - klondike_klive 26 minutes ago
        You just need to drastically recalibrate your definition of good.
      - barrenko 20 minutes ago
        It's an AI slop factory, and it's not going to get any better.
  - panarky 13 hours ago
    That was the story with CGI too, that there would be overwhelming supply that drives prices and value toward zero.
    And yet Marvel exists.
    Turns out in a world of infinite supply, value comes from story, character, branding, marketing and celebrity. Those factors in combination have very limited supply and people still pay up.
    I don't see any reason why AI-gen video is any different.
    [-]
    - tracerbulletx 11 hours ago
      It's still quite difficult and extremely time consuming to create a visual effect. And the technique to film actors and blend them is additionally quite difficult. If you get to the point one person can make a movie, yes you will be limited by your own creativity, but the number of people who can do that is still a lot greater than the number of people who can do that, and manage a 200 million dollar budget production and get an end product that meets their vision.
  - nmilo 14 hours ago
    It will be like YouTube. Distribution will be hard and most of it will be slop but every now and then you’ll discover something so good and so creative and it couldn’t have possibly existed before that it makes the whole experience worth it. The best creative works are led by one person and I’m excited to see what people can come up with.
  - googlryas 12 hours ago
    A lot of people can't actually say what kind of movie they want, until they see it. And even if there are 100,000 releases every year in every genre, virality will probably still exist where even if random, one of those movies is going to get more popular than the rest and then everyone will "need" to see it.
  - mvdtnz 14 hours ago
    Most of us have no idea what movies we want. The most delightful films are a total surprise (other than the drones who watch every Marvel film of course).
- jddj 14 hours ago
  I came to the same sort of conclusion when watching Kitsune, which I think was one person and VEO https://vimeo.com/1047370252
  Granted, 5 minutes isn't 1h30 but it's not a million miles away either.
  [-]
  - xrd 14 hours ago
    It's fantastic.
    I just watched Kitsune, thanks for sharing.
    It reminds me why Flow was so good.
    Flow was great because I could see the shader artifacts. It was the opposite of a Disney model, it was not polished and perfect.
    That's why I loved it. Disney would never do a movie with a plot like Flow. They would write and rewrite it and it would be a perfect example of humanity, but totally devoid of the humanity behind it.
    It is ironic that this new coming wave of AI generated (or AI assisted) films feel like they have more human craftmanship than Disney films, when honestly it is the opposite. Disney has incredible and brilliant animators, but that is all crushed behind the merchandising and gross behemoth of the Disney corporation.
    I used to love seeing independent films. Those art house theaters really only exist in places like Portland, OR these days. But, I'm excited about the next wave of film because it'll permit small storytelling, and that's going to be great.
  - msabalau 11 hours ago
    Kitsune is great!
    I've been a VideoFX tester, and have made a couple of five minute shorts. You end up having to generate a lot of shots that you throw away. This is a lot easier to bear if you are tester without really strict monthly limits, or having to pay to get past them.
    Also, there are all sorts of things you have to juggle or sidestep related to character consistency and sound synchronization. They'll be also sorts of improvements there, but I suspect getting to 90 minutes isn't really a question of spending more time and generations. Right now I think a strong option for solo aspiring AI film makers is to work on a number of small projects, to master the art, and tackle longer projects when the tooling is better.
  - gh0stcat 13 hours ago
    This actually so amateurish and cliche it's painful. The fact people like this shows that art never had a chance when the masses have no taste. This makes me depressed for artists and the future.
    [-]
    - vo2maxer 5 hours ago
      Observations like these remind me of The Académie des Beaux-Arts in France, and more specifically its official Salon (the Salon de Paris), keeping Impressionist painters out of established exhibitions.
    - jddj 13 hours ago
      It won't be novel the 100th or 1000th or millionth time, and standards will rise accordingly. But for now it is, or at least 2 months ago it was.
      Someone created that relatively coherent 5min animated story largely by communicating with a computer in natural language.
      The masses have had plenty worse
    - x-complexity 57 minutes ago
      > This actually so amateurish and cliche it's painful. The fact people like this shows that art never had a chance when the masses have no taste. This makes me depressed for artists and the future.
      This kind of rhetoric can best be summed up by one meme: "It's the children who are wrong"
      Spouting off "unwashed masses" prose will only make people hate (snobs + critics + artists by proxy) more, if you're not willing to do your part and stop shooting down beginning attempts as "amateurish and cliche".
      Actually say, **in words**, what directions & improvements can be made.
    - switchbak 12 hours ago
      Well sure, but we're in the early stages here smashing bones together. When a few million bored teenagers bang at this, I bet you'll see perspectives you've never thought of. It'd be like having someone in the 1920's listen to Nirvana - just a completely different experience.
      Given the dreck coming out of Hollywood, I'm open to that, even if other folks have to wade through a million shitty videos for me to get it.
- hammock 8 hours ago
  https://en.wikipedia.org/wiki/Flow_(2024_film)
  $36 million dollars and an Academy Award. A l m o s t done by just one person. And entirely with open source software.
  The guy's previous movie was a true one-man show but didn't really get screenings: https://en.wikipedia.org/wiki/Away_(2019_film)
- xnx 15 hours ago
  We've got a pretty good datapoint along that trajectory with Flow. Almost entirely one person and has grossed $36 million. https://en.wikipedia.org/wiki/Flow_(2024_film)
  [-]
  - jsheard 15 hours ago
    It was a small team for sure but not a one man show, there are 22 credits for the animation work alone, plus 13 more for sound and music, not counting the director.
    [-]
    - xnx 13 hours ago
      > Almost entirely one person
      It is closer to one than number the staff of other animated films. It's a good data point to keep in mind as AI tools enable even smaller teams to do more.
  - karolist 14 hours ago
    Went to the cinema with my kids for the 2nd time to watch this one, was pleasantly surprised to read this movie was done using Blender, highly recommended.
  - mattfrommars 14 hours ago
    One person? What do you mean? It literally says in the wiki more than one.
    This isn't solo dev game project.
- NitpickLawyer 15 hours ago
  I think you might need qualifiers on that. Are we talking an unknown / unrelated person living in the proverbial basement, or are we talking a famous movie director? I could see Spielberg or Cameron managing to make something like that happen on their name + AI alone.
  If we're talking regular people, the best chance would be someone like Andy Weir, blogging their way to a successful book, and working on the side on a video project. I wouldn't be surprised if something along these lines happens sooner or later.
- SirMaster 13 hours ago
  Well text generation is way ahead of video generation. Have we seen anyone create something like a best selling or high grossing novel with an LLM yet?
  [-]
  - delichon 12 hours ago
    That's why going from one person to zero persons will be so hard. But one Kubrick/Carmack and a bunch of AI could make a compelling movie now.
- bookofjoe 12 hours ago
  Me too. Sam Altman recently predicted that we will see a one-person unicorn company in the near future.
- silksowed 15 hours ago
  very excited to play around. will be attempting to see if i can get character coherence between runs. the issue with the 8s limit is its hard to stitch them together if characters are not consistent. good for short form distribution but not youtube mini series or eventual movies. another comment about IP license is indeed an issue but its why i am looking towards classical works beyond their copyright dates. my goal is to eventually work from short form, to youtube to eventual short films. tools are limited in their current form but the future is promising if i get started now.
- colesantiago 14 hours ago
  My prediction is on track to this and this was made only 4 months ago.
  https://news.ycombinator.com/item?id=42368951
  [-]
  - delichon 14 hours ago
    There may be a solo (not Han) movie good enough to compete in five years, but I doubt that Academy voters will be that welcoming of the tech that can obliterate most of their jobs by then.
    [-]
    - switchbak 12 hours ago
      If AI can fix the terrible ending that was Game of Thrones, then perhaps it won't have been a complete waste after all.
    - kridsdale1 13 hours ago
      Based on the training data being pop culture, we may even get a good Han Solo movie from tools like this. Starring young Ford.
- kevingadd 15 hours ago
  I think the obstacles there are distribution and IP rights. I think we will see content like that find widespread appeal and success but actually turning it into $100m in revenue requires having the copyright (at present, not possible for AI-generated content) and being able to convince a distributor to invest in it. Those both seem like really tough things to solve.
  [-]
  - delichon 14 hours ago
    Purely AI-generated content -- with no human authorship -- is not eligible for US copyright protection. However if a human contributes meaningfully to the final output (editing, selection, arrangement, etc.) it becomes eligible. See Thaler v. Perlmutter (2023).
    [-]
    - Workaccount2 14 hours ago
      >is not eligible for US copyright protection
      Once industry adopts AI generation, which it will, a new law will be quickly signed.
      In a way, not allowing copyright of AI material really only serves a tiny group of people. "We want to empower everyone to bring their ideas to market, not just those with the ability to draw them" is not a particularly evil or amoral sentiment.
      [-]
      - gh0stcat 12 hours ago
        As if the ability is not attainable, people want to be put on top of the mountain without any effort.
        [-]
        Workaccount2 12 hours ago
        When society climbs mountains, they eventually build elevators. Its a core functionality and the reason why we are so advanced. Just take a moment to realize how many peaks you already sit on top of, without even thinking about. Your home is overflowing with cheap wares from mountains ascended ages ago that you now have "no effort" access to.
  - r58lf 11 hours ago
    Yeah, people underestimate how hard it is to get a movie into a theatre AND get people to pay for a ticket.
    Hollywood can barely get any well made movies past $100 million these days unless it's based on some well known franchise (minecraft, Captain America, Snow White) or it has some well known actor.
xnx 15 hours ago
This is amazing. I wouldn't think that something as computationally expensive as generating 8 second videos would be available outside of paid API anytime soon.
torginus 13 hours ago
I am not really technical in this domain, but why is everything text-to-X?
Wouldn't it be possible to draw a rough sketch of a terrain, drop a picture of the character, draw a 3D spline for the walk path, while having a traditional keyframe style editor, and give certain points some keyframe actions (like character A turns on his flashlight at frame 60) - in short, something that allows minute creative control just like current tools do?
[-]
- nodja 12 hours ago
  Dataset.
  To train these models you need inputs and expected output. For text-image pairs there exists vast amounts of data (in the billions). The models are trained on text + noise to output a denoised image.
  The dataset of sketch-image pairs are significantly smaller, but you can finetune an already trained text->image model using the smaller dataset by replacing the noise with a sketch, or anything else really, but the quality of the output of the finetuned model will highly depend on the base text->image model. You only need several thousand samples to create a decent (but not excellent) finetune.
  You can even do it without finetuning the base model and training a separate network that applies on top of base text->image model weights, this allows you to have a model that essentially can wear many hats and do all kinds of image transformations without affecting the performance of the base model. These are called controlnets and are popular with the stable diffusion family of models, but the general technique can be applied to almost any model.
  [-]
  - indexerror 1 hour ago
    These datasets would definitely have a lot of Text => Sketch pairs as well. I wonder if its possible to extrapolate from Text => Sketch and Text => Image pairs to improve Sketch => Image capabilities. The models must be doing some notion of it already.
- spyder 1 hour ago
  Huh "everything text-to-X"? Most video gen AI has image-to-video option too either as a start or end frame or just as a reference for subjects and environment to include in the video. Some of them even has video-to-video options too, to restyle the visuals or reuse motions from the reference video.
- minimaxir 12 hours ago
  Everything is text-to-X because it's less friction and therefore more fun. It's more a marketing thing.
  There are many workflows for using generative AI to adhere to specific functional requirements (the entire ComfyUI ecosystem, which includes tools such as LoRAs/ControlNet/InstantID for persistence) and there are many startups which abstract out generative AI pipelines for specific use cases. Those aren't fun, though.
- Rebelgecko 13 hours ago
  You can do image+text as well (although maybe the results are better if you do raw image to prompted image to video?)
- wepple 9 hours ago
  LLMs were entirely text not that long ago.
  Multi modality is new; you won’t have to wait too long until they can do what you’re describing.
- fragmede 1 hour ago
  image-to-image speech-to-speech exists; yes almost everything is text-to, but there are exceptions
- TacticalCoder 8 hours ago
  I want ...-to-3D-scene. Then I can use Blender to render the resulting picture and/or vid. Be it "text-to-3D-scene" or "image-to-3D-scene".
  And there's a near infinity of data out there to train "image-to-3D-scene" models. You can literally take existing stuff and render it from different angles, different lighting, different background, etc.
  I've seen a few unconclusive demos of "...-to-3D-scene" but this 100% coming.
  I can't wait to sketch out a very crude picture and have an AI generate me a 3D scene out of that.
  > ... in short, something that allows minute creative control just like current tools do?
  With 3D scenes generated by AI, one shall be able to decide to just render it as it (with proper lighting btw) or one shall all all the creative control he wants.
  I want this now. But I'll settle with waiting a bit.
  P.S: same for songs and sound FX by the way... I want the AI to generate me stuff I can import in an open-source DAW. And this is 100% coming too.
smallnix 14 hours ago
Brave to make ads with the Ghibli style. Would have thought that's burned by now.
[-]
- gh0stcat 13 hours ago
  No one has any morals or soul at this point. It's all garbage in, garbage out.
- minimaxir 14 hours ago
  Looking at the video, I think there's shenanigans afoot. The anime picture they input as a sample image is more generic anime, but the example output image is clearly Ghibli-esque in the same vein as the 4o image generations.
deyiao 8 hours ago
Content moderation is incredibly frustrating — it might even be the key reason why Veo2 and even Gemini could ultimately fail. I just want to make some fun videos where my kid plays a superhero, but it keeps failing.
[-]
- itake 8 hours ago
  I have the same issues with OpenAI. Supposedly Grok is better, but their quality isn't as high.
byearthithatius 14 hours ago
Very impressive release compared to what was possible even a single year ago. It feels like we are in a great state right now with respect to ML where all the big companies are competing and pushing each other to make the tech better. This is rare nowadays in America (or in general).
volkk 13 hours ago
this is semi-relevant -- and I do love how technically amazing this all is, but a massive caveat for someone who's been dabbling hard in this space, (images+video) -- I cannot emphasize enough how draining text-2-<whatever> is. even when a result comes out that's kind of cool, I feel nothing because it wasn't really me who did it.
I would say 97% of the time, the results are not what I want (and of course that's the case, it's just textual input) and so I change the text slightly, and a whole new thing comes out that is once again incorrect, and then I sit there for 5minutes while some new slop churns out of the slop factory. All of this back and forth drains not only my wallet/credits, but my patience and my soul. I really don't know how these "tools" are ever supposed to help creatives, short of generating short form ad content that few people really only want to work on anyway. So far the only products spawning from these tools are tiktok/general internet spam companies.
The closest thing that I've bumped into that actually feels like it empowers artists is https://github.com/Acly/krita-ai-diffusion that plugs into Krita and uses a combination of img2img with masking and txt2img. A slightly more rewarding feedback loop
[-]
- dsign 14 minutes ago
  > So far the only products spawning from these tools are tiktok/general internet spam companies.
  Help me here. If tiktok becomes filled with these, will it mean that watching tiktok "curated" algorithmic results will be about digesting AI content? Like, going to a restaurant to be served rubber balloons full of air that then people will do their best to swallow whole?[^1]. Could this be it? The demise of the algorithm? Or will people just swallow rubber balloons filled with air?
  [^1]: Do please use this sentence as a prompt :-)
- justlikereddit 12 hours ago
  [dead]
kumarm 8 hours ago
Pretty disappointed with content moderation on Veo2. Here are the steps I did:
1. Took a picture of me and asked to describe person in the image.
2. Used Imagegen to create the cartoon version using description.
3. Tried to use veo-2.0-generate-001 to generate video of person in image (holding a coffee cup in original image) drinking coffee and having a conversation.
Video generation is blocked by content moderation.
anonzzzies 5 hours ago
I have Advanced but no Veo2 model; is it controlled rollout or something again?
bredren 8 hours ago
The UI on this product page does not make any sense to me. The three prompt workflows don’t stack in any obvious way, then seemingly combine on any submission to the main prompt area?
They generate independent images.
Gemini’s web interface is also way behind chatgpt and Claude. The mobile app is even worse.
This is while having the champ 2.5 pro model in the pocket.
It seems that web product resources are not getting adequate allocation to the AI group(s).
bk496 13 hours ago
I wonder what takes more compute power: this or a blender render farm?
wewewedxfgdf 12 hours ago
1: Press release about amazing AI development.
2: "Try it now!" the release always says.
3: I go try it.
4: Doesn't work. In this case, I give it a prompt to make a video and literally nothing happens, it goes back to the prompt. In the case of the breathtakingly astonishing Gemini 2.5 Coding - attach to source code file to the prompt "file type not supported".
That's the pattern - I've come to expect it and was not disappointed with Google Gemini 2.5 coding nor with this video thing they are promoting here.
[-]
- throwup238 12 hours ago
  On the contrary I had completely written off Google until a few days ago.
  Gemini 2.5 Pro is finally competitive with GPT/Claude, their Deep Research is better and has a 20/day limit rather than 10/month, and now with a single run of Veo 2 I’ve gotten a much better and coherent video than from dozens of attempts at Sora. They finally seem to have gotten their heads collectively unstuck from their rear end (but yeah it sucks not having access).
- martinald 11 hours ago
  I really don't know why Google especially seems to struggle with this so much.
  While Google have really been 'cooking' recently, every launch they do is like that. Gemini 2.5 was great but for some reason they launched it on web first (which still didn't list it) then a day or so later on app, at which point I thought it was total vapourware.
  This is the same - I have gemini advanced subscription, but it is nowhere to be seen in mobile or app. If you're having scale/rollout issues how hard is it to put the model somewhere and say 'coming really soon'? You don't know if it's not launched yet or you are missing where to find it.
- siva7 12 hours ago
  you're using it wrong. change file ending to .txt instead
  [-]
  - bornfreddy 12 hours ago
    I can't tell if this is sarcasm or a helpful advice?
    [-]
    - Workaccount2 11 hours ago
      It's how you have to do it. The gemini model is excellent, but the implementation/chat environment seems like it was thrown together in a weekend as an afterthought.
      You cannot upload a .py file, but if you change the name to "main.txt" you can upload it, and it will automatically treat it as "main.py". Not sure how this hasn't been fixed yet, but it is google so...
hu3 12 hours ago
is there a tool to generate AI videos that doesn't change the original picture so much?
Whisk redraws the entire thing and it barely resembles source picture.
[-]
- rishabhjain 1 hour ago
  Try Snowpixel https://snowpixel.app/
- vunderba 12 hours ago
  Wan 2.1 can do a decent job with i2v.
  https://comfyanonymous.github.io/ComfyUI_examples/wan
- CSMastermind 12 hours ago
  You want Kling: https://klingai.com/global/
  Everything else performs terribly at that task, though a bunch including Sora technically have that functionality.
  Google's tool forcing you to redraw the image is silly.
ninininino 12 hours ago
As usual with Gen AI the curated demo itself displays misunderstanding and failure to meet the prompt. In the "Glacial Cavern" demo, the "candy figures" are not within the ice walls but are in the foreground/center of the scene.
These things are great (I am not being sarcastic, I mean it when I say great) if and only if you don't actually care about all of your requirements being met, but if exactness matters they are mind-bogglingly frustrating because you'll get so close to what you want but some important detail is wrong.
[-]
- dsign 0 minutes ago
  Indeed.
  Even a bad VFX artist has so much more control over what they do. I think that the day "text-to-video" reaches the level of control that said bad VFX artist has from week one, it will be because we have sentient AIs which will, for all ends and purposes, be people.
  That's not to say that there is no place for AI-generated content. Worst case scenario, it will be so good at poisoning the well that people will need to find another well.
Zopieux 9 hours ago
[flagged]
Shadow360 14 hours ago
[flagged]
transformi 15 hours ago
[flagged]
strangattractor 14 hours ago
Google is the new Microsoft in the sense that they can Embrace, extend, and extinguish their competition. No matter what xAI or OpenAI or "anything"AI tries to build Google will eventually copy and destroy them at scale. AI (or A1 as our Secretary of Education calls it) is interesting because it is more difficult to protect the IP other than as trade secrets.
[-]
- mritun 13 hours ago
  > Google will eventually copy…
  Weird take given Google basically invented and released through well written papers and open-source software the modern deep learning stack which all others build on.
  Google was being disses because they failed to make any product and were increasingly looking like Kodak/Xerox one trick pony. It seems they have woken up from whatever slumber they were in
  [-]
  - Workaccount2 12 hours ago
    They didn't entirely drop the ball since they did develop TPUs in anticipation of heavy ML workloads in the future. They tripped over themselves getting an LLM out, but quickly recovered primarily because they didn't have to run to nvidia and beg for chips like everyone else in the field is stuck doing.
  - strangattractor 12 hours ago
    Like MS, Google is ubiquitous - search is much like Office and DOS before that. Anything OpenAPI or the other AI competitors create would normally be protected by patents for instance. Not so with an AI models. Google has the clout/know how to responded with similar technology - adding it to their ubiquitous search. People are both lazy and cheap. They will always go with cheaper and good enough.
    [-]
    - harrall 12 hours ago
      Google invented the technology.
      https://en.m.wikipedia.org/wiki/Attention_Is_All_You_Need
      OpenAI was the copycat.
      If Google had patented this technique, OpenAI wouldn’t have existed.
      [-]
      - strangattractor 10 hours ago
        How do you patented it? What specific "practical, real-world application" does AGI purport to solve? All these algorithms work by using massive amounts of data. They all do it the same way or close to the same way.
        "Algorithms can be patented when integrated into specific technical applications or physical implementations. The U.S. Patent Office assesses algorithm-based patent applications based on their practical benefits and technological contributions.
        Pure mathematical formulas or abstract algorithms cannot be patented. To be eligible, an algorithm must address a tangible technical challenge or enhance computational performance measurably.
        Patenting an AI algorithm means protecting how it transforms data into a practical, real-world application. Although pure mathematical formulas or abstract ideas aren’t eligible for patents, algorithms can be embedded in a specific process or device." [1]
        [1] https://patentlawyer.io/can-you-patent-an-algorithm/#:~:text...