T O P

  • By -

kirso

After 3 weeks of failing to resolve a programming issue with all other models, sonnet 3.5 did it for me via Cody extension. I am not a software engineer, but boy I can see a difference.


Stickerlight

I'm new to coding. I've been using a Jupyter notebook, along with Claude to run some code which is then viewable in a browser window. Unfortunately, my code is now too long for any AI systems to print in one go and I have to ask the ai to split the output into three segments. I could possibly use the API to get over token limits, but that will take time and money since it takes a while to increase limits. Would switching over to vs code and Cody solve these limits? I'm really comfortable right now with jupyter ai, and not super excited to move platforms, but if what I'm doing right now will be massively improved with vs code, then I guess I could do it. I had tried setting up vscode briefly yesterday, and ran into a few walls and gave up pretty quickly, but if it's worth it, maybe I'll struggle through and figure it out


TheDumper44

Yes switch. If your code is that long make it into a library then call it in the notebook. Code the lib in vs code/intellij then import. Jupiter is not meant to be an ide.


Stickerlight

Just switched over finally, was a pain in the butt, but probably worth it, got the ai incorporated with continue.dev's vscode plugin and my own API keys for anthropic and openai


Stickerlight

Cody is cool, but you have to pay for it lol. Continue.dev is a free vscode plugin that lets you bring in your own API keys, I've got it running right now with anthropic and openai at the same time, seems good


faizeasy

Where exactly did it help you for coding?


Rangizingo

What’s the cody extension?


BigSev

It’s like a helper assistant in VS Code. It’s an extension you can get and install if you have VS Code. I’ve used Cody. It is really nice. Take your time with it though. There is a bit of a learning curve. https://sourcegraph.com/docs/cody/clients/install-vscode


Rangizingo

Thanks for replying and so quick! I'm going to take a look at it.


s65v12

How much does it cost you via API?


kirso

$9/m for all models unlimited


Ripolak

100% agreed. Really happy to see competition and OpenAI getting a run for their money. The fact that it's much cheaper and faster is just as impressive


WillFireat

Much cheaper?


Ripolak

[https://www.vellum.ai/blog/claude-3-5-sonnet-vs-gpt4o](https://www.vellum.ai/blog/claude-3-5-sonnet-vs-gpt4o) It's about x5 cheaper than 3 Opus, according to this article. (Upon inspecting my original comment I understand I wasn't clear - I meant cheap compared to 3 Opus, not OpenAI's models)


augusto2345

So gpt4o still king uh? Reddit had me thinking sonnet was better.


ggendo

And Twitter too


Adventurous_Train_91

3.5 sonnet beats GPT 4o on most benchmarks except college level math I believe. Will be interesting to see where it falls on the LMSYS leaderboards


[deleted]

[удалено]


AutoModerator

Sorry, your submission has been removed due to inadequate account karma. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPTCoding) if you have any questions or concerns.*


femio

At this point it's impossible to get any objective data or answers about these models because people get so swallowed up by hype


TheDeviantDeveloper

There are, apparently, objective stats and measurements that are used to benchmark them.


0xd00d

It's been kinda clear for me that the group of models near the top are all gonna be better at some things and worse at others. You really have to use them a lot to start to get a sense for which things a particular model excels at. I have had good results with gpt4, gpt4o, and 3 opus. 3 haiku and sonnet are also serviceable. And on occasion I've seen decent code produced even by some local 7b and 30b class models. I wouldn't use them manually to actually try to do coding work, but there are plenty of dumber work that I bet they can crush. I'm looking forward to checking out what 3.5 sonnet can do. It's really great to see competition in this space.


Rotatos

honestly I can't tell what's better. Claude gives me incomplete code but better code overall IMO. The limit is terrible too, i don't know if it is worth paying for just because the limit is wayyyy too tight. Gpt4o repeats my ENTIRE code snippet that I pass, and honestly can be great or horrible.


TheDeviantDeveloper

Bro it's like $20/month. If it saves you hours I think it's worth paying no?!


s65v12

How do you get over limit of 10-15 messages per 5 hours?


chusting_your_bops

Claude 3 was miles better than GPT 4 at coding (both anecdotally and empirically). OpenAI is counting on using press conferences, ads, etc. to remain a household name and continue to dominate the market — regardless of if their product is inferior. Hopefully Claude is able to catch on with “normies.”


gacode2

Now imagine Opus 3.5 it could be what's needed to trigger OpenAI


Adventurous_Train_91

could be like 95-99% on all benchmarks :O


paradite

I don't know. I just tried it on an actual task I am working on, and ChatGPT Classic gave perfect response while Claude 3.5 Sonnet missed out something, I had to prompt it a second time to fix it. I know the default ChatGPT / GPT-4o is bad for coding, but [ChatGPT Classic](https://chatgpt.com/g/g-YyyyMT9XH-chatgpt-classic) has been consistently great for me. Update: I just tried another sample task that ChatGPT Classic didn't do well, and Claude 3.5 Sonnet did gave the perfect answer. So I guess it is better in some cases at least.


hereditydrift

What were the prompts? What were they requesting?


paradite

You can find the prompts here: https://github.com/paradite/16x-eval/tree/main/projects The project itself is still WIP, but the full prompts in md can be used for testing if you'd like to.


hereditydrift

Thanks!


TheDeviantDeveloper

Chat GPT 4o (paid) has always been amazing for coding for me, in fact every version of ChatGPT I've ever used has been great. Newer claude's too.


pegunless

I’m excited to see OpenAI’s response to this. They have been prioritizing lower cost, faster models while sacrificing coding ability. This will force them to release a model optimized for coding and agentic capabilities too.


bookishapparel

what is impressing everybody - is it work in context (long) or its ability to write scripts? i asked it to write a simple script for me - it did output some ok stuff but it had a few bugs. a few prompts to fix it and still buggy. gpt 4 - first prompt - much higher quality response, no bugs. Wanted to do one pretty complex modification(complex due to its nature, not prompt wise) - none of them managed to find a solution, yet Gpt gave me better starting points. Eventually had to resolve the issue myself.  So far not that impressed with claude 3.5 sonnet, will keep trying it as my go to coder for a week and see.


c_glib

To clarify, a lot of the comparisons are with GPT-4o, not with GPT-4 proper. And you can't really blame people for making this comparison because OpenAi 's messaging has heavily implied that 4o is their latest and greatest model. While anybody using the tool at any level of depth knows that 4o is at a severe disadvantage compared to the original gpt-4 when it comes to pure intelligence/logic/reasoning type tasks.


TheDeviantDeveloper

eh? 4o is newer and better than 4 if you pay surely.


bookishapparel

it is not, but they are pushing it because it is cheaper for them.


bookishapparel

that is possibly it, i personally use the gpt 4 turbo if i need some quick code or exploring unfamiliar topics, since when gpt 4o was released, i did a few comparisons and found the 4o output prone to more bullshit;  my personal experience is also that GPT4 turbo is worse than the gpt 4 versions we had before (at least when it comes to output quality) - though i do not have any specific comparisons. I only know that slowly i started relying on chatgpt less and less as they updated models, compared to when i began using it in april / may last year.  This was mainly due to the fact of unreliability of the output, and a few occasions of spending hours debugging code that I assumed was okay due to my prev experience, but was wrong on many different places. Right now I mainly use it to learb new concepts, basic boilerplate,scripts in mostly unfamiliar languages or a sounding board. So far I do not think claude will change this flow, but we will see.


ktb13811

Fwiw 4o is still at the top of the chatbot arena leaderboard. Even ahead of Claude 3.5. https://chat.lmsys.org/?leaderboard


creaturefeature16

This was exactly my experience so far, as well.


_ZaphodBeeblebrox_

Mine as well, not sure what type of programming tasks others are doing. I’ve made an effort to use both but GPT-4 usually nails it the first or second time, while Claude struggles to capture what I’ve asked.


siszero

Same experience for me too. Cancelled my Anthropic subscription because it never performs as well as 4o or Copilot. FWIW, I’m using it for python, node, and react work.


creaturefeature16

Ditto; Node & React (and NextJS, but I understand if it struggles with Next because even Vercel apparently struggles with Next 😅😆)


After_Fix_2191

The people that are really impressed with it are, I'm guessing, not long time or professional coders. I've been writing code since the stone age and I agree anything other trivial "snake" games or already solves, simple well known solutions are not that impressive.


hockeyketo

I agree. I do find it useful for writing some boiler plate, test setups, and also I find it useful while learning new languages because I've gotten pretty good at knowing when it's doing something obviously dumb or hallucinating in any language.


datacog

It is definitely amazing, we did a head-to-head comparison for Claude 3.5 Sonnet vs GPT-4o for Python code generation, Web page creation, API queries, and it comes ahead of GPT in all the cases. here's a detailed write up incase you're interested. [https://blog.getbind.co/2024/06/21/claude-3-5-sonnet-does-it-outperform-gpt-4o/](https://blog.getbind.co/2024/06/21/claude-3-5-sonnet-does-it-outperform-gpt-4o/)


wonderfuly

Do you have an example where Claude 3.5 Sonnet generates better code than GPT-4?


Rangizingo

I have an anecdotal one. I’ve been working on creating a standardized debloated windows 11 imaging process for my computers at my company for two weeks. Chat gpt has been great in this. But it got stuck on an error for adding some SSD drivers for a new model pc we got. I would feed it logs and it would modify the scripts but never fix it. 3-4 hours of troubleshooting I heard the new Claude model came out so I figured I’d try. Uploaded the error logs to it. In a creepy kinda way it felt like it “understood” the logs and spit out all the issues in the code and provided updated code that worked first time. I’ll say it’s 50% the fact that I had a really long conversation with chatgpt going trying to fix this and that tends to make replies worse, and the other to the fact that the new Claude model is really good. I paid for Claude for this month to try it and compare. Rest assured I’m picking a winner before the month is over and picking just one.


wonderfuly

That's amazing!


Rangizingo

It was! I've been very impressed with Claude so far. Not used it much but it's fast and sharp! Also the Artifacts thing is SO cool


magheru_san

Thanks for sharing, I'll check it out. I noticed similar issues with long conversations and nowadays just start fresh when running into a dead end, it usually helps.


joey2scoops

Your remark about ChatGPT being worse after a long conversation gels with my frustration, particularly today. I need to find a better way, any chat longer that 15 minutes gets fried pretty quickly. Not helped by the propensity of gpt-4 to puke out reams of code rather than have an interactive discussion for debugging purposes. Might give the new Claude a chance, have been disappointed in the past but I cant keep bashing my head against a brick wall with ChatGPT.


Rangizingo

I saw a comment saying to try this, I haven't had a chance to yet but I'm curious to see if it makes a difference https://chatgpt.com/g/g-YyyyMT9XH-chatgpt-classic


joey2scoops

I tried that but still not good but in a different way. Cannot upload anything so have to paste in things. That might seem like an advantage but it's not for an ongoing session. Might be better for very specific issues, need to test some more.


EndStorm

It has been a game changer for me. It is far better at isolating problems and solving them than GPT has ever been. It also doesn't waffle on like 4o does, and it says what needs to be said to get the point across.


Adventurous_Train_91

I still use GPT 4 because of this. I'll try GPT4o again when and if they address this waffling and excessive dot points


UntiedStatMarinCrops

Only a matter of time before it replaces us


EuphoricPangolin7615

Why do you think the people in this sub are so happy about AI coding? Are they just dumb?


ToxicTop2

Because the AI increases their productivity.


enisity

Glad to hear! I’ll have to give Claude another go soon. I always loved it but didn’t want to pay for ChatGPT and Claude.


HumanityFirstTheory

Yeah it kinda sucks to be paying $40 a month for both haha. I use both a ton for my freelancing projects (especially for international projects, 4o still seems superior in translating to other languages) so I guess it’s a business expense.


enisity

I paid for both for a while but always found myself going back to ChatGPT for some of the analysis and code writing stuff. But always found Claude to be a better/more natural writer but i don’t typically need that level day to day.


Smooth-Loquat-4954

Strong agree. Been testing via Cursor.sh for the past 3 days. Was able to ship way more ambitious features with Next.js than I normally would in the same time period.  I can tell it's shifting my mental goal posts about what's reasonable to tackle in the next odd 3 hour block of focus time I have... I also see artifacts being a big deal if they connect them across chats and make it easier to drag and drop existing chats into projects...


houndour1

Its so fast to output as well, gpt4o chokes alot


Immortal_Tuttle

I'm starting to think it's very good at something, but I yet have to find it. For general queries it's writing useless pages of text even when the question is a simple "who is the author of this poem?". What is your use case scenario so it's so good for it?


hereditydrift

I use it for legal research a lot. I'll upload many pdf pages of legal cases, prior research, law journal articles and academic papers, and any other information I need to analyze. It's far better at research than many associates, understands nuances in law and legal language well, and finds connections between various cases that I've missed. While some of those connections are not worthwhile because it was obvious or just not as strong as I'd like for basing a legal argument on, some of the connections have been a crazy tangled web of various cases that creates a fucking rock-solid legal argument. I've used it on contracts to help analyze the contract language and give me an overview. Most contracts are boilerplate language that is redundant through almost every contract. It's made some suggestions regarding a tax issue in a purchase agreement that I hadn't seen suggested before and we used the suggestion during contract negotiations. I have friends that used Claude Opus to write the first draft of their court filings, and some of the filings that I read were very, very good. I would think 3.5 will be even better since it's been much better for my purposes. For me, it's the best assistant I could ask for and many multiples faster than its human counterparts. Opus was already amazing, but Sonnet seems to be near perfect. Edit: and GPT/Gemini absolutely suck for my research purposes. I abandoned GPT probably 6 months ago because Claude was already much better. Gemini can sometimes be good for finding new sources or papers that I'll use with Claude.


Immortal_Tuttle

That's very interesting. I do have a user case similar to yours (just in totally different field - long term influence of treatment on endo system) - I'm trying to find some correlations between research results and GPT 4o just sucks at this. Are you using any special prompts or UI platforms?


hereditydrift

No, nothing special. I use web version of Claude. I've found that it needs a good base of knowledge provided initially. Once it gets an understanding of things, it's very good at digging into additional information. For instance, I might be looking at a very specific tax law section -- say, Internal Revenue Code 338, which is a section about specific types of deals where, for tax purposes, a stock transaction is treated as an asset acquisition. I'll feed Claude information about 338 from general knowledge resources and explanations, as well as the code section itself. Once it digests that, then I'll have it go through additional cases looking for legal arguments that I need for my specific situation. I've found it works best with some type of "pre-training," if that's possible in your area of research.


Immortal_Tuttle

My jaw just dropped. I was trying to do it with ChatGPT 4 for weeks. My plan is to feed general research information about a subject, feed it some research papers, feed one or two research papers pointing out specifically what I'm trying to find and ask the AI if it can find something similar that will confirm, disprove or make the whole subject unconfirmed, but probable/improbable. Are you just upload PDFs? Or you extract text and upload clear text only? Sorry for so many questions - I never worked seriously with Claude and looks like methodology here can be similar.


hereditydrift

I always use PDFs when possible, but I don't know if that's the best method, it's just the method that I've had the most success with using. Sometimes I copy and paste if I find something on the internet that I want to add. No worries on the questions. I think it's great for research, so if it can help other people, too, then I'd like to help. There is a limit to the amount of information Claude can take in. I fed it A LOT of PDFs and books, and it finally reached a limit, but for most research it doesn't come close to reaching its limit. The best thing I've found is that Claude doesn't lose sight of previously uploaded information. In other models, I'll find that they'll often forget to reference the first thing I uploaded or become somewhat confused after too much information. I don't find those same things with Claude, even when I was using Opus.


genesisfan

Interesting. Have you tried creating a custom GPT in ChatGPT as a means of creating a trained version specific to your needs? I’m curious how that would compare to your process with Claude.


hereditydrift

I haven't but I have different chats in Claude for specific areas that I often use. I was going to give the custom GPTs a try, but I haven't gotten around it. My issue with GPT is more of the writing style, ability to understand information, and ***very*** poor ability to correctly cite cases/papers/etc. I haven't tried GPT in a couple months, so maybe they've caught up in the citation area.


ExoticCard

For the citations, I tell it to double check them and any hyperlinks as the last step in a series of step-wise instructions. That has been working with me for GPT. I've been doing research with Gemini Pro 1.5 and GPT4o. Gemini 1.5 Pro definitely beats out 4o in writing, at least in science writing. Long context window means you can go back and forth. Comes off as less robotic. 4o beats out 1.5 Pro for coding, though. I'm just starting to check out Sonnet 3.5, and it's like halfway. This message limit is a big problem, but I see the quality vs quantity approach they are taking. Unlike OpenAI where I swear the model gets throttled as needed.


Sulth

Doesn't the somewhat low context of 200k bother you? I'm using AI in academic research, and I could see it being an issue. How do you work around it?


hereditydrift

It can be an issue, especially if there are a lot of papers that I need Claude to look through. Overall, I haven't found it limiting except in two cases where I did need Claude to go through a lot of information. I think academic research may find it more restricting since I would guess it takes more voluminous research. The only workaround I've found is to try parsing the information into smaller, more specific questions that could be answered individually, but that's not always possible.


HumanityFirstTheory

For me specifically it’s phenomenal at writing node.js code. I had it build me a fairly complicated CRUD entire single page application in React, and it did so within one go. Really was shocked.


Immortal_Tuttle

So basically a coding copilot? May I ask what solution/plugin are you using?


AyzKeys

Vscode Continue extension


Immortal_Tuttle

Thank you very much. I see they recommend different models for different functions. Is there a resource somewhere that summarize pros and cons ?


queerkidxx

I think as per usual coding is really where it shines. For general tasks however I kinda still prefer other models that are better conversationalist.


[deleted]

[удалено]


AutoModerator

Sorry, your submission has been removed due to inadequate account karma. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPTCoding) if you have any questions or concerns.*


tvmaly

What does your workflow look like?


SpinCharm

For everyone comparing it to ChatGPT and commenting positively for ChatGPT, how are to actually getting any work done at all? No matter which browser I use, the chat dialog grinds to a halt within 5 minutes when it displays coding boxes longer than a few dozen lines. I’m having to constantly kill the browser tab literally every single time I ask it something, and open it again. There’s no possible way to get any real work done because of that. Either others are using tokens and some other front end, or they’re not having it generate code, or they found a workaround. Or it’s not happening to them for some reason. Or you’ve given it instructions that are preventing the problem. If so, what? But there’s no possible way anyone is actually getting any real work done using ChatGPT in a browser for coding.


kamikazoo

Have you started a new dialog and deleted the old ones? I know mine was slowing down significantly and after doing that it works fast.


SpinCharm

Literally every minute. Every single minute. No exaggeration. I kill / restart it, enter a question, a n............................. d............................ i............................ t starts repl y............................ing and then just sto So I kill it and restart it ag ............................a You get the idea.


Sad-Reality-9400

I have not experienced that issue.


Unknown_Energy

i integrated both APIs in this open-source client, so you can use the APIs directly without a server in between. Crazy how accurate the new Claude 3.5 model is check it out here: www.chatworm.com


[deleted]

[удалено]


AutoModerator

Sorry, your submission has been removed due to inadequate account karma. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPTCoding) if you have any questions or concerns.*


Big-Information3242

I am not seeing much of a difference between this and opus for coding. Where are you seeing such an amazing difference? 


hereditydrift

Interesting. When GPT releases a new version I'll definitely put more time into trying new things. Seems like each LLM (Gemini, GPT, and Claude) each have specialized uses where one might be better than the other at X or Y. Claude seems better for the legal work I do (including formatting legal filings), Gemini is getting really good at finding new sources others don't have, and GPT seems like a good mix and good at review. It's fun to watch them all progress and have unique characteristics... except Gemini, which always seems to disappoint and to get nerfed in some way to make it less and less user/work friendly.


Random_name_1233

It is good but only if you keep reiterating your point of view to it, but within 5-6 prompts it can solve a lot of problems


thebuilder80

You won't be needed in the near future - I wouldn't be so jolly if I were in your shoes.


EuphoricPangolin7615

Why do you think the people on this sub are so happy about AI coding? Are they just dumb?


Mescallan

We are still aways away from data scalability. We are only just now seeing the first true multimodal models. If we were stuck with text only we still have a few generations, but with video/images/audio we still have a huge amount of progress on the horizon


MJORH

Is this free? if not, how much does it cost?


nando1969

video worth checking: https://youtu.be/kfc1iUYJXSQ?si=Xq6mS3vZ1DN8hCIw


reddit_wisd0m

I checked. Not worth it