legal research Archives - 成人VR视频 Institute https://blogs.thomsonreuters.com/en-us/innovation-topics/legal-research/ 成人VR视频 Institute is a blog from 成人VR视频, the intelligence, technology and human expertise you need to find trusted answers. Wed, 12 Feb 2025 15:38:20 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 成人VR视频 Best Practices for Benchmarking AI for Legal Research /en-us/posts/innovation/thomson-reuters-best-practices-for-benchmarking-ai-for-legal-research/ Wed, 12 Feb 2025 15:38:20 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=64911 At 成人VR视频, we do an enormous amount of AI testing in our efforts to improve our customers鈥 ability to move through legal work faster and more effectively. We鈥檝e noticed an increase in interest in AI testing generally, and in benchmarking AI applications for legal research specifically. We鈥檝e learned a lot in our thousands of hours of AI testing, as such we offer the following best practices for those interested in considering an updated or differentiated approach when testing or benchmarking AI for legal research.

1. Test for the results you care about most.

This would seem obvious, but we鈥檝e seen a lot of confusion about it, and if we could only make one recommendation, this would be it. It鈥檚 foundational for all other recommendations.

If you cared most about determining how long it takes to drive from one place to another, you wouldn鈥檛 just measure highway time, you鈥檇 measure total door-to-door time. If you cared most about car maintenance costs, you wouldn鈥檛 just measure the cost and frequency of brake repairs and maintenance.

With the use of AI for legal research, there are no LLMs nor any LLM-based solutions that offer 100% accuracy. Because of that, all answers generated by large language models or LLM-based solutions, even if they use Retrieval Augmented Generation (RAG), must be independently verified.

Some assume verification is a simple matter of checking the sources cited in an AI answer, but this is incorrect. We鈥檝e seen plenty of examples where an AI-generated answer is wrong, and the cited sources simply corroborate the wrong answer. Verification requires using additional tools (like a citator, statute annotations, etc.) to ensure the answer is correct.

This means every time an AI-generated answer is used for research, there is a three-step process the researcher must engage in: (1) review the answer, (2) review the cited material from the answer, (3) use traditional research tools to make sure the answer and cited material are correct.

When we talk with researchers about research generally and this process specifically, what they care about most is (a) getting to a correct answer or understanding of the relevant law, and (b) the time it takes to get to that correct answer or understanding.

Because of this, the two most important measures are:

  • Percentage of times using this three-step process the user can get to the right answer, and
  • Time it takes to complete all three steps

Surprisingly, the percentage of errors in answer in step 1 can have very little impact on the percentage of correct answers by the researcher using all three steps or the time to complete those steps (unless errors are excessive), as long as citations and links to primary law are good and those primary resources are current and easily verified. Focusing on step one is like trying to figure out door-to-door times by measuring highway speeds only. It鈥檚 not very useful.

For instance, which of the following systems would you rather use?

  • System where the initial AI answer is 92% accurate, but verification, on average, takes 18 minutes, and post-verification accuracy is 97%, or
  • System where the initial AI answer is 89% accurate, but verification, on average, takes 10 minutes, and post-verification accuracy is 99.9%

It鈥檚 a clear choice, but there is often a misplaced focus on measurement of the first step in the process to the exclusion of steps two and three. Measure what you care about most.

2. Use realistic, representative questions in your testing.

Presumably you want to evaluate AI for the typical legal research you or your organization does. For instance, if you look at the research your organization does and find the questions are roughly 20% simple questions, 60% medium complexity, and 20% very complex or difficult, and that roughly half are questions about IP law and half are about federal civil procedure, then a benchmark testing 90% simple questions about criminal law would not be very helpful to you.

At 成人VR视频, we model our testing based on the real-world questions we see from our customers every month. For your own testing, focus on the question types that best represent the researchers you鈥檙e focused on.

Testing mostly simple questions with clear-cut answers is easiest for testing, but if those types of questions don鈥檛 represent what your users do most (it doesn鈥檛 well represent most AI usage in Westlaw), then the results are not particularly helpful. Similarly, if you primarily test overly complex, extremely difficult and nuanced questions 鈥 or trick questions, those can be useful for testing the limits of a system, but they tend not to be very helpful for most real-world decision making.

3. Test a lot of questions.

In our own testing, we鈥檝e found that testing small sets of questions is rarely representative of actual performance with a larger set. Large language models can generate different responses each time, even with identical inputs. Additionally, if responses are long and complex, graders may disagree, even when judging identical responses. For just a quick general sense of direction, it鈥檚 fine to test with a sample of questions as small as 100 or so, but for comparing algorithms/LLMs against each other, we strongly recommend checking the results as you grade and testing until the measure of interest stabilizes. For example, if you are running a comparison between two systems to see which is preferred, you would test until the rate at which one system is preferred over the other stops changing dramatically with each new batch of questions. Another guide to the number of questions you should test is the confidence level and interval you want (see next section).

4. Calculate and report confidence levels and intervals.

Even with a relatively large set of questions, measurements of accuracy are only so precise. When using these measurements to make decisions, it鈥檚 important to understand the degree or range of accuracy of the measurement, often referred to as confidence level and confidence interval. You can think of confidence intervals and levels like margin of error in surveys. It lets you know how reliable or repeatable the measurement is expected to be.

For instance, testing AI accuracy based on 200 questions, if you ran the test again with the same questions/answers but different evaluators, or used the same evaluators but with a different 200 random, representative sample of questions, would you expect the exact same result? Typically, you wouldn鈥檛. You鈥檇 expect the result to fall within a certain range, so it鈥檚 important to report that range along with the results so decision makers understand the differences between algorithms/LLMs that are meaningful and those that are not meaningful. The proper way to report this is with confidence intervals and levels. You can read more about them . Using standard assumptions, when measuring an error rate of 10% from a sample of only 100 questions, you can be about 95% confident that the true error rate is between 4.1% and 15.9%. This is called a 95% confidence level, and the 鈥+/- 5.9%鈥 is the margin of error. If you measure an error rate of 10% from a sample of 500 questions, the 95% confidence interval would be between 7.4% and 12.6%, or 10% +/- 2.6%.

The basic power analysis to estimate a confidence interval assumes a perfect means of detecting the outcome you are trying to measure. If there is some uncertainty in that detection, e.g., if two independent evaluators disagree about the outcome some percentage of the time, then the margin of error increases. A grading process or measurement that鈥檚 unreliable ~5% of the time, might increase the margin of error from 5.9% to 7.3%, in our example above with 100 questions. It’s important to note that there are various methods for calculating standard error, and these examples make simplifying assumptions that likely underestimate the confidence intervals observed in practice.

5. Use a combination of automated and manual evaluation efforts.

Having human evaluators pore through lengthy answers to complex questions can be difficult and time-consuming. Ideally, we would just have AI evaluate the accuracy and quality of answers generated by AI. This is sometimes referred to as LLM as judge. But in the same way that AI makes mistakes when generating an answer, it can also make mistakes when evaluating the quality of an answer against a gold-standard answer written by a human. In our experience, modern LLMs are pretty good at evaluating AI-generated answers against gold-standard answers when answers are clear and relatively short. With length and complexity, we鈥檝e found the LLM as judge approach to be very unreliable.

For instance, has shown that LLMs tend to struggle when evaluating responses to complex and challenging questions like those requiring expert knowledge, reasoning, and math.

Since most test sets will contain a sample of simple/easy/clear questions and answers, it makes sense to use AI for automated evaluation of these, then use human evaluators for the rest, at least until AI improves to the point where more can be automated.

6. For human grading, use two separate human evaluators for each answer, and have a third (ideally more experienced) evaluator to resolve conflicts.

For assessments like these, can be a real issue. In our own testing, we鈥檝e found attorneys evaluating AI-generated answers for more complex legal research questions can disagree about the accuracy or quality of answers about 25% of the time, which makes single-grader evaluation unreliable. To improve reliability, we have two evaluators separately grade each answer, and where there are conflicts, we have a third, more experienced evaluator resolves the conflict.

7. When answers are wrong, investigate to see if the gold-standard answer might be wrong.

In the same way people make mistakes in evaluating answers, they can also make mistakes in coming up with the gold-standard answer for testing. In our experience, we鈥檝e found some instances where the AI-generated answer was evaluated as incorrect when compared to the gold-standard answer, but when we dug into it further, it turned out the AI was correct and the person who put together the gold-standard answer was wrong. Sometimes AI makes mistakes and sometimes humans make mistakes 鈥 you should check both.

8. If evaluating multiple algorithms/LLMs/solutions, make sure the evaluators are blind to which algorithm/LLM/solution the answer was generated by.

In our evaluations we try to avoid human bias in grading. Sometimes an evaluator has had bad experiences or great experiences with a certain product or LLM in the past, and we don鈥檛 want them to bring that bias to the current evaluation, so when evaluating different solutions, we first strip away anything that would identify the source of the solution, so results are not biased by past positive or negative experiences.

9. Grade the value of answers in addition to making a binary determination of whether the answer has an error.

What鈥檚 right or wrong in an answer can vary enormously in terms of positive value and negative impact. For instance, consider the following answers:

A. Answer is correct in every way but is short and high level. It just gives a basic description of the legal issue as it relates to the question but doesn鈥檛 provide any references to primary or secondary law for verification, nor any nuance regarding exceptions or other considerations.

B. Answer is lengthy and nuanced, addressing multiple aspects of the question and discussing important exceptions that might apply, and it provides references with citations and links for verification, and it’s correct in every way except in one of the citations, the date is incorrect, but that鈥檚 easily verified and corrected when clicking the link from the citation.

C. Answer is incorrect in every way and all its linked references point to primary law that simply corroborate the wrong answer.

If the evaluation is simply a binary view of the number of answers that contain an error, then answer A looks good and answers B and C look equally bad. In reality, answer C is far worse and more harmful than answer B, and Answer B is likely much more valuable to the researcher than answer A.

In our evaluations, we鈥檙e looking for answer attributes that are helpful to researchers, like depth of the answer and quality of the references, and we don鈥檛 just evaluate errors in a binary way. We consider answers that are totally wrong to be far worse than answers with erroneous statements in otherwise correct and helpful answers. Similarly, we consider erroneous statements in answers based on whether they address the core questions or are tangential to it, and whether they鈥檙e contradicted in the answer or easily verified with the linked references. We鈥檇 like to eradicate all errors, of course, but some are more harmful than others.

10. Look for errors beyond gold-standard answers.

Often LLMs generate answers with information beyond the scope of a gold-standard answer. For instance, the gold-standard answer might say the answer should state that the answer to the question is no, and it should explain that with X, Y, and Z, and it should specifically cite to cases A & B and statute C.

The LLM-generated answer might state the answer is no and explain X, Y, and Z with references to A, B, and C, but it might also add a few statements about exceptions or related issues or an additional case or statute. Sometimes these additional statements are incorrect, even when everything else is correct. So, if an LLM-as-judge or human evaluator only looks at the gold-standard answer to see if the AI-generated answer is correct, that evaluation can miss errors in the additional material. This means evaluators need to do independent research beyond simply looking at the gold-standard answers to determine if an answer has an error.

11. Consider testing reliability.

LLMs often have some randomness built into them. Many have a temperature setting that can be used to minimize or eliminate this, making answers more consistent when asking the same question multiple times.

But some LLMs are better at this than others, and some integrated solutions that use LLMs in conjunction with other techniques, like RAG, don鈥檛 set temperature low to allow for more creativity in answers.

For big decisions you might be making, consider testing reliability by running the same question 20 times and seeing if any of the answers are substantially worse than the other answers to the same question.

The above are our and learnings from our extensive expertise with AI, Gen AI and LLMs over the past 30 years. At 成人VR视频 we put the customer at the heart of each of these decisions we make and are transparent that at the point of use all our AI generated answers must be checked by a human.

As we work through testing our AI products, our teams do not follow each of these steps for every test we do, sometimes we prioritize speed over accuracy of testing or vice versa, but we ensure we clearly understand the trade-off in prioritizing some of these steps and communicate this with our teams. The bigger and more important the decision we鈥檙e trying to make, the more of these steps we follow.

This is a guest post from Mike Dahn, head of Westlaw Product, and Dasha Herrmannova, senior applied scientist, from 成人VR视频.

]]>
Quick Check Mischaracterization Identification: New Westlaw Enhancement Furthers the 成人VR视频 Generative AI Vision /en-us/posts/innovation/quick-check-mischaracterization-identification-new-westlaw-enhancement-furthers-the-thomson-reuters-generative-ai-vision/ Tue, 22 Oct 2024 13:19:05 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=63572 成人VR视频 recently announced deeper integration of CoCounsel 2.0 in Westlaw and Practical Law as well as new generative AI research features 鈥 Mischaracterization Identification in Quick Check and AI Jurisdictional Surveys 鈥 that are saving customers significant time and helping them ensure accuracy of their research. The enhancements build on the 成人VR视频 vision to deliver a comprehensive GenAI assistant for every professional it serves.

Below, CJ Lechtenberg, senior director, Westlaw Product Management, 成人VR视频, shares her insights on developing Mischaracterization Identification, a generative AI capability to help detect mischaracterizations and omissions in legal briefs.

In the five years since Quick Check was introduced, you鈥檝e added many enhancements including Quick Check Contrary Authority Identification, Quick Check Judicial and Quick Check Quotation Analysis. How did integrating generative AI make the Mischaracterization Identification enhancement different than previous ones?

Lechtenberg: This enhancement takes researchers beyond the step of knowing what might be a potential mischaracterization to an explanation of why something might be a potential mischaracterization 鈥 and that is radically different from any feature we鈥檝e deployed in Quick Check before.

I鈥檓 sure it鈥檒l come as no surprise when I say that generative AI is just a completely different beast. Lay people may think about the law as being black and white.聽You can do this; you 肠补苍鈥檛 do that.聽But legal professionals know that the law is really a sea of varying shades of gray. With machine learning, we wrestled with how we could ever give the machine enough data to figure out all the different ways an attorney may mischaracterize the law.

In Quick Check Quotation Analysis prior to the Mischaracterization Identification enhancement, we highlighted the actual textual differences 鈥 additions, omissions, and changes 鈥 in the quotations and showed the context around the quotes.聽Doing so certainly saved researchers a significant amount of time and helped them spot issues they might not otherwise find, but the onus was still on researchers to review everything and determine what the precise differences were and how material they might be, if at all.聽Even with the additional context provided, it could still be difficult to determine whether the quotations were taken out of context, especially if the quotes themselves didn鈥檛 appear to be different.

In developing Mischaracterization Identification, we recognized that the task of analyzing quotations and their context is so nuanced that attorneys will have different expectations for whether a mischaracterization occurred, so we needed to provide more than just categorizations. We found that large language models (LLMs) can generate nuanced descriptions of potential mischaracterizations, versus just explicit categorizations, and do it well, which is hugely beneficial for this type of task.

How will using Mischaracterization Identification give legal professionals and law firms a competitive advantage? How will judges using it benefit?聽

Lechtenberg: The advantages of using the new Mischaracterization Identification are substantial for both legal professionals and the judiciary 鈥 both in terms of speed of review and quality of work product.聽When we launched Quick Check Quotation Analysis in 2020, customers, both legal professionals and the judiciary, lamented about how time-consuming it is for them to review quotations and how challenging it is to spot differences. It is a mentally taxing task and often our brains fill in the blanks 鈥 interpreting what we think a brief maybe should say but actually doesn鈥檛.聽 Attorneys never have a surplus of time, so the last thing they want to do is spend the little bit they have on the most tedious of tasks and still end up missing potential problems.

For attorneys, Mischaracterization Identification will help them efficiently and accurately make contextual misstatement and omission determinations for their opponents鈥 and their own quotations and the context surrounding those quotations. The fear of missing their own mistakes is very real for attorneys, but the possibility of missing the opportunity to capitalize on their opponents鈥 mistakes is an even larger concern. This new enhancement reduces both of those worries and will help attorneys be even better advocates for their clients.

Judges will also be able to effectively review the filings of parties in matters before them much faster. Attorneys owe a duty of candor to the judiciary and the Mischaracterization Identification feature will help flag any potential issues quickly. An added benefit, which members of the judiciary or their staff perhaps haven鈥檛 considered, is the ability to analyze their own orders and opinions to ensure that they haven鈥檛 made mistakes that could be appealed. This new enhancement will help alert judges and law clerks to potential issues before they finalize their opinions.

What early feedback are you hearing from customers?

Lechtenberg: In a recent survey, 93% of law firm professionals told us they鈥檝e seen opposing counsel misuse a quotation, 66% said they鈥檝e seen misrepresentations by an associate or colleague, and 65% of corporate respondents said they check the accuracy of outside counsel鈥檚 quotations.聽The need to review opposing counsels鈥 and colleagues鈥 briefs for mischaracterizations of the law is still a very real issue for attorneys. Likewise, attorneys have said they鈥檙e always concerned about the accuracy of their work and that maintaining their reputation as a credible litigator with courts and opposing counsel is incredibly important.

Customers are extremely excited about this new Quick Check enhancement to help combat these concerns and we鈥檝e received positive feedback from them.聽One law firm managing partner stated that they would use this tool a lot.聽They cite-check their opponents鈥 briefs, so any shortcuts are beneficial to them. They recognize that most of the time, errors are harmless, but occasionally there are things they want to bring to the court鈥檚 attention and this feature will help them spot those issues more quickly and accurately.

Another law firm partner said this new feature is the 鈥渦ltimate security blanket鈥 because everything attorneys do is based on their credibility, and this feature alerting them to quotes being taken out of context before filing with the court would calm some of those fears.

Any surprising or unexpected moments as the team worked on developing or launching Mischaracterization Identification?

Lechtenberg: The fact that we鈥檝e accomplished this now with the use of LLMs is exciting, a little surprising and a long time coming. I鈥檓 an attorney who leads a team of attorneys; we鈥檙e literally trained to question everything and have a healthy dose of skepticism.聽But I have been dreaming about a mischaracterization identification feature in Quick Check ever since we developed Quotation Analysis more than five years ago. At my core, I believed someday this could be achieved, but for years traditional machine learning approaches were just not powerful or nuanced enough to do it well.

Leveraging LLMs for a use case like this is a new frontier like we鈥檝e never seen before.聽The LLM鈥檚 ability to analyze text from an uploaded document and compare that text to the text from the cited case used to support the argument and then go beyond highlighting textual differences and provide an actual explanation of what may be problematic 鈥 whether that鈥檚 a selective quote, omitted context or a misinterpreted holding 鈥 has been absolutely astounding.

What鈥檚 the one thing you want everyone to know about Mischaracterization Identification?

Lechtenberg: Mischaracterization Identification will not only help researchers spot contextual misstatements and omissions in their opponents鈥 or their own quotations and contextual statements faster and with more accuracy, but most importantly it will help them understand why those misstatements or omissions may be problematic. And, spoiler alert: Mischaracterization Identification is just the beginning of how 成人VR视频 will harness the power of generative AI in Quick Check to solve important customer problems.

For more on Mischaracterization Identification, read the press release or check out the by Mike Dahn, head of Westlaw Product Management, 成人VR视频.

]]>
The Transformative Role of AI in Professional Tools: A Conversation With David Wong and Leann Blanchfield /en-us/posts/innovation/the-transformative-role-of-ai-in-professional-tools-a-conversation-with-david-wong-and-leann-blanchfield/ Wed, 02 Oct 2024 13:33:39 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=63286 Leann Blanchfield, head of Editorial, 成人VR视频, said now is the most exciting time in her 30+ years with the company.

In the latest , Blanchfield shared how the power of generative AI 鈥 and the dramatic leap it鈥檚 making in how professionals across industries can access large quantities of data 鈥 is transforming the legal industry and beyond. Blanchfield credits the more than 1,500 attorney editors on her team, who create and enhance content, with harnessing the power of generative AI for legal research.

Human expertise is just one component of how 成人VR视频 is capitalizing on the potential of generative AI. Three elements are critical, as David Wong, chief product officer, 成人VR视频, noted in his comments about the launch of CoCounsel 2.0 at ILTACON: 鈥淲e have the data, the expertise, and the tech. Few have all three in such quantity and depth.鈥

In the new , Wong focused on the role of human domain experts, noting they鈥檙e key to the process of creating and validating data used by AI models for professional research.

鈥淭here’s a lot of both prompt engineering, fine tuning, and system refinement that’s necessary to get quality to a usable spot,鈥 Wong said. 鈥淓xperts, experienced researchers and experienced lawyers can help to gauge whether or not the systems are correct. We couldn’t have an objective, quantified measure of quality on these systems without the editors, without those experts.鈥

Wong and Blanchfield discussed the importance of human experts in ensuring the accuracy and reliability of AI.

鈥淢aintaining accuracy is at the heart of what the editorial team does,鈥 Blanchfield said. 鈥淚t鈥檚 the number-one priority across every editorial team. We maintain our content to be accurate and trusted.鈥

Wong acknowledged it鈥檚 challenging for the team to process and update unstructured, constantly changing data in real time. He said that 成人VR视频 ensures that its AI models are customized and meeting the varying needs of various jurisdictions through a combination of software and algorithms that take advantage of the LLMs.

鈥淪o when you ask a question of , for example, we are running an end-to-end algorithm which runs search, retrieves data, re-ranks, interprets and then ultimately passes that information to a large language model to synthesize and produce the answer,鈥 Wong said. 鈥淚t’s a very complicated system which involves multiple types of technology, multiple types of information retrieval.聽Processing unstructured, dynamic data and customizing AI models requires integrating multiple technologies and algorithms to optimize performance.鈥

Hear more of Wong and Blanchfield鈥檚 insights on integrating AI into professional tools and ensuring that information is trustworthy in the of the TechConnect series, which brings diverse and dynamic perspectives from all corners of the technology world with thought-provoking questions and conversation.

]]>
How Harmful Are Errors in AI Research Results? /en-us/posts/innovation/how-harmful-are-errors-in-ai-research-results/ Fri, 02 Aug 2024 14:19:28 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=62473 AI and large language models have proven to be powerful tools for legal professionals. Our customers are seeing the gains in efficiency and tell us it鈥檚 greatly beneficial. However, there has been a lot of discussion lately of errors and hallucinations, but what hasn鈥檛 been discussed is the extent of harm that comes from errors or the benefits of answers with an error.

First, let鈥檚 settle on terminology. We should use terms like 鈥渆rrors鈥 or 鈥渋naccuracies鈥 instead of 鈥渉allucinations.鈥 鈥淗allucination鈥 sounds smart, like we鈥檙e AI insiders and know the lingo, but the term is often defined narrowly as a fabrication, which is just one type of error. Customers will be as concerned, if not more concerned, about non-fabricated statements from non-fabricated cases that, despite being real, are still incorrect for the question. 鈥淓rrors鈥 or 鈥渋naccuracies鈥 are much better and more encompassing ways to describe the full range of problems we care about.

Next, let鈥檚 consider types of errors and risk of harm from each. Error rates are often just reported as a percentage, which is a binary view 鈥 either an answer has an error or it does not, but that鈥檚 overly simplistic. It conflates the big differences in risk of harm from different types of errors and ignores the potential benefit of lengthy and nuanced answers that contain a minor error.

There are dozens of ways to categorize errors in LLM-generated answers, but we鈥檝e found three to be most helpful:

  1. Incorrect references in otherwise correct answers
  2. Incorrect statements in otherwise correct answers
  3. Answers that are entirely incorrect

A fourth category of error that sometimes comes up in discussions with customers is about inconsistency, where the system provides a correct answer one time, then later, when the same exact question is submitted, the answer is different and sometimes less complete or incorrect. Minor differences in wording are very common when submitting the same question. Substantial differences are uncommon, but when they do result in an error, the error simply falls into one of the three categories above.

Incorrect references refer to situations where an answer is correct, but the footnote references provided for a statement of law does not stand for the precise proposition of the statement. Fortunately, risk of harm with these types of errors appears to be low, since they鈥檙e easy to detect when researchers review the primary law cited. Answers with these types of errors still offer substantial benefit to researchers because they get them to the right answer quickly, often with a lot of nuance about the issues, but the researcher still has to use additional searches or other research techniques to find the best source material.

Incorrect statements in otherwise correct answers are often obvious in the answer. An answer might say the law is X in paragraphs 1 鈥 4 and then, inexplicably, declare the law is Y in paragraph 5, then go back to stating the law is X in paragraph 6. Risk of harm with these errors also appears to be low, since the inconsistency is obvious and prompts the researcher to dig into the primary law to figure it out. Answers with these types of errors still offer some benefit, since they point the user to highly relevant primary law, explain the issues, and help the researcher with what to look for when reviewing primary law.

Answers that are entirely wrong are more problematic. These are quite rare in our testing, but they do occur. Often a simple check of the primary sources cited will resolve the error quickly, but sometimes additional research is needed beyond that. These answers still offer some benefit to researchers, since they often point to relevant primary law in a way that is more effective and useful than traditional searching, but they also come with greater risk of harm, since the incorrectness of the answer is not obvious, and simply reviewing cited sources does not always resolve the issue.

These sound scary, but researchers have been dealing with this type of issue for ages. For instance, secondary sources can be incredibly helpful for summarizing complex areas of law and offering insights, but they sometimes fail to discuss important nuance, and sometimes the law has changed since they were written. If researchers relied on them alone, without doing further research, they would be at risk of harm, even if they consulted cited primary sources.

Yet we would never tell researchers to avoid using secondary sources because they can sometimes be beautifully written, very convincing, and utterly wrong. What we tell researchers is they can be enormously helpful for research but must be used as part of a sound research process where primary law is reviewed, and tools like KeyCite, Key Numbers, and statutes annotations are used to make sure the researcher has a complete understanding of the law.

Individual research tools have rarely been perfect. Their value has been in improving sound research practices. Stephen Embry captured this idea well in his recent blog post, :

鈥淭he point is not whether Gen AI can provide perfect answers. It鈥檚 whether, given the speed and efficiency of using the tools and their error rates compared to those of humans, we can develop mitigation strategies that reduce errors. That鈥檚 what we do with humans. (I.E. read the cases before you cite them, please).鈥

But if you must check primary resources and engage in sound research practices when using a research tool, is there really any benefit to using it? If it improves overall research times or helps surface important nuance that might otherwise be missed, the answer is yes.

Prior to launching AI-Assisted Research, we knew large language models would not produce answers free of errors 100% of the time, so we asked attorneys if the tool would be valuable even with an occasional error, and if we should we release it now or wait until it was perfect?

Most of the attorneys said, 鈥淚 want this now.鈥 They saw clear benefits and thought an occasional error was worth it for the extraordinary benefits of the new tool, since they would easily uncover an error when reading through primary law. They said that if they knew the answers were generated by AI, they would never trust them and would verify by checking primary sources. If there was an error, those primary sources (and further standard research checks, like looking at KeyCite flags, statute annotations, etc.) would reveal it. That鈥檚 why we put AI in the name of this CoCounsel skill, so researchers would be encouraged to check primary sources.

Our customers have submitted over 1.5 million questions to AI-Assisted Research in Westlaw Precision. Generally, three big research benefits come up in discussions:

  1. It gives them a helpful overview before diving into primary sources.
  2. It uncovers sub-issues, related issues, or other nuances they might not have found as quickly with traditional approaches.
  3. It points them to the best primary sources for the question more quickly and efficiently than traditional methods of research.

Customers have described these benefits with great enthusiasm, telling us AI-Assisted Research 鈥渟aves hours鈥 and is a 鈥済ame changer.鈥

Lawyers know they need to rely on the law when writing a brief or advising a client, and the law lies in primary law documents (cases, statutes, regulations, etc.). Researchers have always known that when they鈥檙e looking at something that is not a primary law document, such as a treatise section, a bar journal article, or an answer from AI, they must check the primary law before relying on it to advise a client or write a brief. That鈥檚 why we cite to primary law in the answers and why we provide an even greater selection of relevant primary and secondary sources under the answers 鈥 to make this checking easy.

But what about ? That lawyer submitted his brief without ever reading any of the cases he was citing.

That 肠补苍鈥檛 be the standard for considering the value of products like Westlaw that provide a rich set of research tools that make it easy to check primary sources, understand their validity, and find related material. If the standard were, a user might not read any of the primary law, many high-value research capabilities today would be deemed useless.

The way to dramatically reduce the risk of harm from LLM-based results or any other individual research tool, like secondary sources, is what it has always been: sound research practices.

Jean O鈥橤rady conveyed this beautifully in :

鈥淒oes generative AI pose truly unique risks for legal research? In my opinion, there is no risk that could not be completely mitigated by the use of traditional legal research skills. The only real risk is lawyers losing the ability to read, comprehend and synthesize information from primary sources.鈥

At 成人VR视频, we鈥檙e continuing to work on ways to reduce all types of errors in generative AI results, and we expect rapid improvement in the coming months. Because of the way large language models work, even with retrieval augmented generation, eliminating errors is difficult, and it鈥檚 going to be quite some time before answers are completely free of errors. That鈥檚 the bad news.

The good news is that harm from these types of errors can be reduced dramatically with common research practices. It鈥檚 why we鈥檙e not only investing in generative AI projects. We鈥檙e also continuing to build out a full suite of research tools that help with the entire research process because that process will continue to be important.

Even when errors get reduced to just 1%, that will still mean that 100% of answers need to be checked, and thorough research practices employed.

We鈥檙e currently involved in two consortium efforts to provide benchmarking for generative AI products. When generative AI products for legal research are tested against these benchmarks, I expect we鈥檒l see the following:

  • None of the products will produce answers that are all entirely free of errors.
  • All the products will require sound research practices, including checking primary law documents, to reduce risk of harm.
  • When sound research practices are employed, the risk of harm from errors in the answers is small and no different in magnitude from the risks we see with traditional research tools like secondary sources or Boolean search.

Even in the age of generative AI, sound research practices remain important and are here to stay. As Aravind Srinivas, CEO and cofounder of , said,

鈥淭he journey doesn鈥檛 end once you get an answer鈥 the journey begins after you get an answer.鈥

I think Aravind鈥檚 statement applies perfectly to legal research and to the art of crafting legal arguments. Even as our teams strive to reduce errors further, we should keep in mind the benefits of generative AI and weigh them against the new and traditional risks of harm in tools that are less than perfect. When used as part of a thorough research process, these new tools offer tremendous benefits with very little risk of harm.

This is a guest post from Mike Dahn, head of Westlaw Product Management, 成人VR视频.

]]>
Two years of unprecedented progress 鈥 Law firms deriving tangible value from 成人VR视频 AI /en-us/posts/innovation/two-years-of-unprecedented-progress-law-firms-deriving-tangible-value-from-thomson-reuters-ai/ Tue, 02 Jul 2024 16:32:06 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=63507 As we approach the two-year mark since we launched聽Westlaw Precision, the industry has seen unprecedented development 鈥 in many ways instigated by the launch of Chat GPT in November 2022, customers and software developers alike never experienced such exponential opportunity (and some would argue risk).

And here at 成人VR视频 鈥 we haven鈥檛 stood still; in fact, we have never moved faster!

Within this 24-month period Professionals no longer need to speculate how AI聽肠辞耻濒诲听affect their work because they now have a better sense of how it聽will 鈥斅and in some cases already聽is.

And for our customers in November 2023, we launched聽AI-Assisted Research聽鈥 which allows customers to ask complex legal research questions in natural language and quickly receive synthesized answers, with links to supporting authority from Westlaw content and links to further examine that authority. AI-Assisted Research streamlines the initial phase of legal research with sophisticated answers to questions and the authority those answers are based on, saving hours of work. In fact, this is how one of our valued customers describes the solution:

鈥淏ecause 成人VR视频 has the best case law database, lawyers can feel confident that the answer AI-Assisted Research is generating in response to our questions is well supported.聽 The fact that the AI-Assisted Research delivers all the resources it relied upon in coming up the answer, right beneath the answer, amplifies the confidence we all can have in using the program to help with our research needs.鈥聽Andrew Bedigian, Larson LLP

And since launch 6k customers have run more than 1.5M searches through AI-Assisted Research. 成人VR视频 closed loop LLM is trained on millions of terabytes of our trusted and verified content 鈥 rather than publicly available information 鈥 and this generates the most trusted and reliable answer on the market today.

鈥淚 did go through and compare the ChatGPT paid version as compared to this AI-Assisted Research. What I can tell you is there is a major difference in the libraries that Westlaw has versus any other program. There is no other program that has the secondary sources, the court orders, the appellate documents, the primary sources 鈥 every single thing that Westlaw offers, which is not only on point and published, you have the citations, there鈥檚 a source of truth from where the information comes from and it鈥檚 only as good as the prompts you give it and the parameters you put.鈥聽Jesse Guth, owner, Guth Law Office

Our customers tell us each day what a critical tool AI assisted Research is for their legal research both to those new to the profession and those that are experienced in the field. By design our intuitive user experience guides customers to run follow-up research 鈥 AI Assisted Research provides customers with a comprehensive answer which can be easily interrogated, linking to more sources for validation.

At Blank Rome, we are committed to providing the highest levels of innovative client service. As part of this effort, last year we were excited to implement Westlaw Precision and Practical Law Dynamic AI capabilities for our attorneys, which has resulted in increased efficiencies and enhanced results.鈥聽 Frank Spadafino, chief information officer, Blank Rome

Over the years 成人VR视频 has always been at the forefront of legal research innovation, helping customers to reduce research times and ensure nothing important is missed. AI-Assisted Research is among the very best of these tools, and when it鈥檚 used as intended, it offers enormous benefits with very little risk of harm. I strongly encourage you to try it yourself 鈥 you will find it鈥檚 a powerful research tool you鈥檒l want to employ regularly in your research processes.

]]>
成人VR视频 Launches Westlaw Edge UK with CoCounsel /en-us/posts/innovation/thomson-reuters-launches-westlaw-edge-uk-with-cocounsel/ Mon, 22 Apr 2024 16:17:51 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=61891 成人VR视频 is expanding customers鈥 access to AI-Assisted Research with the introduction of聽. The first 成人VR视频 generative AI legal research offering in the UK, Westlaw Edge UK with CoCounsel streamlines the initial phase of legal research by allowing customers to ask complex questions in natural language and delivering synthesized answers with detailed insights from top results along with a list of key cases, legislation, and topics.

The UK rollout follows last year鈥檚聽听辞蹿听l to help legal professionals quickly get answers to complex research questions.

鈥淲estlaw Edge UK with CoCounsel will help users save time and work smarter by jumpstarting 鈥 not replacing 鈥 their current methods for legal research,鈥 said Andrew Buckley, vice president, Research and Commentary, 成人VR视频. 鈥淎I-Assisted Research enables practitioners to ask a question in everyday language and get an answer grounded in the powerful combination of trusted, comprehensive Westlaw content and the latest in large language models. Working with increased efficiency is key for practitioners, who need the right research tools to be competitive as they navigate complex legal issues for their clients in a constantly evolving environment.鈥

In addition to empowering customers with AI-Assisted Research, Westlaw Edge UK with CoCounsel gives customers access to an AI assistant, called CoCounsel. It鈥檚 integrated with Westlaw Precision and Practical Law Dynamic Tool Set, and soon will be integrated with Document Intelligence and HighQ.

鈥淲estlaw Edge UK with CoCounsel furthers our long-standing leadership in delivering the most sophisticated legal research solutions in the UK,鈥 Buckley said. 鈥淲e launched Westlaw UK nearly 25 years ago, marking the globalization of Westlaw, and we introduced Westlaw Edge UK in 2020. Adding AI capabilities to the Westlaw Edge UK portfolio of solutions represents the next chapter in our rich history of legal innovation in the UK and using AI technology to help legal researchers be more efficient.鈥

For more on Westlaw Edge UK with CoCounsel and additional new CoCounsel capabilities for legal and tax professionals, check out the聽news release.

]]>
成人VR视频 Completes Acquisition of Casetext, Inc. /en-us/posts/innovation/thomson-reuters-completes-acquisition-of-casetext-inc/ Thu, 17 Aug 2023 15:47:05 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=61878 成人VR视频 announced today that it has closed on its聽previously announced聽acquisition of Casetext, Inc., a provider of technology for legal professionals, for a purchase price of $650 million in cash.

Founded in 2013, Casetext uses advanced AI and machine learning to build technology for legal professionals, creating solutions that help them work more efficiently and provide higher-quality representation to more clients. Casetext鈥檚 customers include more than 10,000 law firms and corporate legal departments. Its key products include CoCounsel, an AI legal assistant powered by GPT-4 that delivers document review, legal research memos, deposition preparation, and contract analysis in minutes.

The acquisition supports 成人VR视频 鈥渂uild, partner and buy鈥 strategy to bring generative AI solutions to its customers and the company鈥檚 efforts to redefine the future of professionals through applications of generative AI. Other recent developments include 成人VR视频 commitment to invest $100 million-plus annually to integrate AI into its flagship content and technology solutions, as well as its work with Microsoft for a new plugin with Microsoft 365 Copilot, with the two companies collaborating on a legal drafting solution that leverages Westlaw, Practical Law and Document Intelligence. In July, 成人VR视频 also launched a beta program to pilot new generative AI capabilities in Westlaw with select customers.

For more details, read the聽press release.

]]>