LLMs and Copyright Risks: Benchmarks and Mitigation Approaches

2:00pm-5:30pm, Ruidoso, NAACL 2025

The landscape of artificial intelligence has been dramatically transformed by the advent of Large Language Models (LLMs) such as GPT and its successors. These powerful systems have not only revolutionized natural language processing but have also permeated diverse sectors including healthcare, software development, finance, and education. While LLMs have unlocked unprecedented capabilities in text generation and analysis, they have simultaneously given rise to complex legal and ethical challenges, particularly in the realm of copyright law. The ability of these models to produce human-like text has blurred the boundaries between original creation and potential copyright infringement, as evidenced by recent New York Times legal actions against AI company (Microsoft). This tutorial aims to navigate this intricate terrain, providing a comprehensive exploration of the copyright issues surrounding LLMs and equipping participants with the knowledge and tools to address these challenges.

Schedule

Outline

  1. Copyright Law and LLMs
  2. Probing and Benchmarking
  3. Introduction of SHIELD
  4. Copyright Behavior Backtracking
  5. Copyright Risk Mitigation
  6. Mitigating Copyright Risks via LLM Alignment
  7. Copyright and Plagiarism in AI4Science
  8. An Example for Future Directions

Slides

Slides are available here:

Speakers

David Atkinson
David Atkinson
UT-Austin
Xiusi Chen
Xiusi Chen
UIUC
Jing Gao
Jing Gao
Purdue
Huawei Lin
Huawei Lin
RIT
Xiaoze Liu
Xiaoze Liu
Purdue
Qingyun Wang
Qingyun Wang
W&M
Boyi Wei
Boyi Wei
Princeton
Zhaozhuo Xu
Zhaozhuo Xu
Stevens
Denghui Zhang
Denghui Zhang
Stevens

Contact

For questions, please email: dzhang42@stevens.edu

Citation

If you find this material helpful, please consider citing the following papers:

@inproceedings{zhang-etal-2025-llms,
    title = "{LLM}s and Copyright Risks: Benchmarks and Mitigation Approaches",
    author = "Zhang, Denghui  and
      Xu, Zhaozhuo  and
      Zhao, Weijie",
    editor = "Lomeli, Maria  and
      Swayamdipta, Swabha  and
      Zhang, Rui",
    booktitle = "Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)",
    month = may,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.naacl-tutorial.7/",
    doi = "10.18653/v1/2025.naacl-tutorial.7",
    pages = "44--50",
    ISBN = "979-8-89176-193-3"
}

@inproceedings{xu-etal-2024-llms,
  title = "Do {LLM}s Know to Respect Copyright Notice?",
  author = "Xu, Jialiang  and
    Li, Shenglan  and
    Xu, Zhaozhuo  and
    Zhang, Denghui",
  editor = "Al-Onaizan, Yaser  and
    Bansal, Mohit  and
    Chen, Yun-Nung",
  booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
  month = nov,
  year = "2024",
  address = "Miami, Florida, USA",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.emnlp-main.1147/",
  doi = "10.18653/v1/2024.emnlp-main.1147",
  pages = "20604--20619"
}