By Judith van Stegeren
At Datakami we try out new models and tools every week, to keep up to date with the developments in large language models (LLMs) and generative AI. This month I took a first look at Claude, a family of language models by Anthropic AI.
Anthropic is an American AI company. Their full name is Anthropic PBC, with PBC standing for Public-Benefit Corporation. That means Anthropic's main focus is not solely making money, but also balancing the interests of their stakeholders (often defined as suppliers, employees, and customers) and striving towards a public benefit goal that is part of their company formation documents. In May, Anthropic raised $450 million in Series C funding from a set of investors that includes Spark Capital, Google, Salesforce, and Zoom.
Looking at Anthropic's company website, the more-than-average focus on research and AI safety catches my eye. They describe themselves as "an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our customers and for society as a whole." It is in vogue now for AI companies to promote themselves with "Impact all of humanity through AI!" slogans, and we can find similar phrases on OpenAI's website. However, Anthropic has been in the news lately for being vocal about the negative impact of AI on society.
Claude is Anthropic's main product: a family of language models that incorporates Anthropic's AI research and frameworks for AI safety.
So far, most people have only encountered ChatGPT. Depending on their views of AI and technology it has been amazing, scary, or underwhelming. Claude is a large language model like OpenAI's ChatGPT. So why is Claude interesting?
For starters, Claude is famous for its large context size. A model's context size determines its input and output size: how much text can it read and how much text can it write? To give you an idea: OpenAI's default GPT-4 model has a context size of about 8,000 tokens. Its special larger-size brother GPT-4-32k can deal with about 32,000 tokens. Claude boasts a context size of 100,000 tokens, or about 75,000 words, which is a 3x improvement over GPT-4.
I was very curious about Claude because I read it has a very distinctive, friendly tone of voice. Reportedly, it's very different from ChatGPT's stuffy corporate writing style. I also read that it can handle PDF files really well.
Claude is in open beta and can be accessed via the web at claude.ai. Officially it is not available yet in Europe. However, I could access it by running a VPN endpoint in the US.
When you open Claude, there are a few suggested use cases: summarize a PDF document, practice Spanish vocabulary, and explain how a Python game works.
Claude's menu and conversation list:
Processing PDF files is one of Claude's strengths. Claude combines that with a larger-than-average context size, which means it can ingest entire books or reports and do text generation based on the contents. As a researcher, I regularly work with PDF files such as research papers, so I was interested in experimenting with this feature.
One of Claude's demo use cases is summarizing a PDF document of the famous US case Brown vs. Board of Education.
Claude's summary and study questions for Brown vs. Board of Education:
These results look pretty decent, and they're even formatted neatly in a Markdown table. So far there is no noticeable difference in quality with ChatGPT.
I wanted to give Claude's PDF processing capabilities a spin myself, so I uploaded two PDF files: one of my academic papers, and about half of my PhD thesis. I own the copyright of both these texts, and as their author I know their contents really well. I then asked Claude to summarize, give writing style improvements, and come up with a few question-answer combinations.
Prompting Claude to summarize one of my academic papers:
Claude's summary of an academic paper on video game corpora:
Claude's answers were pretty good. The first question looks a bit off, but I did not spot any hallucinations in the summary. Claude has also done a decent job of picking the most important facts in a large amount of text.
Claude's summary of the first half of my PhD thesis (100+ pages):
Summarizing part of my PhD thesis was a bit more problematic. There are no hallucinations in the text, but Claude has difficulty picking the most salient, important bits from the text. This is not surprising since the PDF was so long, and I did not specify which part of the PhD I was most interested in. A computer science student would probably not have done a better job.
Claude's study questions for my PhD thesis:
For the PhD thesis, I also asked it to come up with a few PhD defense questions.
Claude's PhD defense questions based on chapter 1 of my PhD thesis:
These were pretty good. This could definitely be used by PhD students, supervisors, or committee members as inspiration for the real-world PhD defense. Points for Claude.
So far we've tested the happy flow. Let's make it a bit harder for the LLM. I also deliberately asked a few vaguely worded questions, and questions that could not be answered from the text. ChatGPT tends to react to these prompts with hallucinated answers. But to my surprise, Claude answered in a very different way.
When asked a vague question, Claude inserts disclaimers about interpretation in the output text:
Claude's output for the vague question included disclaimers like "Based on my reading" and "But these are just my interpretations", which are clear signposts for the subjectivity of the output.
When asked about things that are not clear from the input text, Claude inserts disclaimers about the lack of information in its output text:
For the question that could not be answered using the contents of the PDF, Claude indicated that there was too little information available to answer the question: "I cannot infer...". This is a significant and pleasant difference with ChatGPT. ChatGPT regularly outputs generic disclaimers (("As an AI language model, I do not have opinions") but they are often unhelpful and don't contribute much to the quality or interpretability of the output.
I'll be honest, I fell a bit in love with Claude's tone of voice. Ethan Mollick has said that Claude stands out for its friendly writing style. I agree. Claude's answers were readable, precise, and felt more matter-of-fact and friendly.
Claude's UI was a bit clunky during my experiments. It was often slow to load, especially the menu/conversation list. When I tried to upload my thesis in two parts, the PDF uploader failed a few times without a helpful error message. As a result, I'd love to use Claude via the API. As far as I can tell, it's not possible to access Claude's API unless you get accepted for Claude For Business. Datakami is located in Europe, so that might prove difficult as long as Claude is only available in the US and UK.
Once you have API access, you pay 1–3 cents per 1000 tokens for the regular Claude model, which is significantly cheaper than GPT-4. Regular GPT-4 costs 3–6 cents per 1000 tokens, and GPT with a 32k token context size is even more expensive.
I've heard that Anthropic's focus on AI safety has one drawback: Claude's safety measures are sometimes too restrictive. Claude will refuse to generate text related to sensitive topics such as sexuality, health(care), nationalities, etc. That means that if you want to use Claude to generate fiction, it can refuse to perform even simple tasks. In my experiments, I only asked Claude to generate non-fiction text, so I haven't encountered this problem yet.
I tested this out by asking Claude to write an outline for an erotica story involving characters from two different countries. This way I hope to trigger at least one safety measure.
What is interesting is that once I remove the word "erotic", Claude starts responding to the prompt again and is happy to comply. However, it has hallucinated that it refused to generate "inappropriate" content per my request.
Given these results, I would use Claude for the following use cases:
Subscribe to our newsletter "Creative Bot Bulletin" to receive more of our writing in your inbox. We only write articles that we would like to read ourselves.