Google Vertex AI Agent Builder¶

Overview

Here we'll use Vertex AI Agent Builder with the MITRE CWE specification to aid mapping CWEs to vulnerability descriptions.

This is a no-code option.

We'll implement a closed grounded system to ensure the accuracy of the data (and mitigate hallucinations)

Grounded: content is provided to inform the answers
Closed system: answers come from only the documents you provide

Tip

In other words, we'll build NotebookLM.

where NotebookLM is basically a combination of Vertex AI Search for Unstructured (PDFs, HTML, etc.), Vertex AI Grounding, and a custom UX/UI.
But we'll take advantage of the structured data (JSON) that we have for MITRE CWE list, instead of using the unstructured data from the MITRE CWE list PDF.

Result

The result is that we have a grounded closed system (that compares in performance and accuracy to NotebookLM.

But we don't have reference links to the source content in the response i.e. I didn't add that part yet but it's standard functionality that is easy in Vertex AI.

Grounding Confidence¶

Quote

For each response generated from the content of your connected data stores, the agent evaluates a confidence level, which gauges the confidence that all information in the response is supported by information in the data stores. You can customize which responses to allow by selecting the lowest confidence level you are comfortable with. Only responses at or above that confidence level will be shown.

There are 5 confidence levels to choose from: VERY_LOW, LOW, MEDIUM, HIGH, and VERY_HIGH.

https://cloud.google.com/dialogflow/vertex/docs/concept/tools

Quote

To create a data store and connect it to your app, you can use the Tools link in the left navigation of the console. Follow the instructions to create a data store. https://cloud.google.com/dialogflow/vertex/docs/concept/tools

Recipe¶

Same recipe as before but we'll use Google Vertex AI Agent Builder

MITRE CWE Specification¶

Same MITRE CWE Specification as the data source.

Build Vertex AI Agent¶

This link gives the steps with links to the details, summarized here:

Vertex AI Agent Builder
Create App
Select app type
1. Agent (preview) "Built using natural language, agents can answer questions from data, connect with business systems through tools and more"
Create Data Store
1. The MITRE CWE JSON data is converted to jsonl format for import
2. It takes ~5 minutes to ingest (create embeddings for) the jsonl file
There are lots of other Settings available like Logging, Git integration to push/pull agents from a Github repo, or just download the JSON file that represents the agent.
The built agent supports Interactions with the API .

Tip

To create an Grounded Open system, select "search" app type.

The agent will retrieve information from the local documents you provide and via web search.

Note

Alternatively these steps can be implemented with code e.g. https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#generative-ai-gemini-grounding-python_vertex_ai_sdk

Quote

Note: Conversation history is used as context during tool invocation. Learn more

Data Preprocessing¶

Remove unneeded sections from the json

content_history
views
categories

python3 ./trim_json.py # cwe.json -> cwe_trimmed.json

Convert to jsonl for import

python3 ./json_to_jsonl.py # cwe_trimmed.json -> cwe_trimmed.jsonl

Data Import¶

Check if the System is Closed¶

Quote

What is a dog?

Success

The system is closed because the GPT can't answer the question because there is no information about dogs in the MITRE CWE specification.

What are the different types of XSS?¶

What CWE IDs Relate To XSS?¶

What Is The Parent Weakness Or CWE For XSS And CSRF?¶

What CWE IDs Relate To Path Or Directory Traversal? List All CWE IDs And Their Description In A Table¶

What is the CWE Associated With CVE¶

Quote

What is the CWE associated with CVE-2021-27104 "Accellion FTA OS Command Injection Vulnerability"

What is the CWE associated with "Cisco Small Business RV320 and RV325 Routers Information Disclosure Vulnerability"

Example Usage: CWE-1394¶

Quote

what is the best CWE to describe the root cause weakness in CVE "ProductX contains a default SSH public key in the authorized_keys file. A remote attacker could use this key to gain root privileges.".

Other App Builder Docs¶

These were not used or required but listing here as I found them informative.

Takeaways¶

Takeaways

Google Vertex AI Agent Builder allows/requires more control over the agent than the ChatGPT GPTs currently.
Google Vertex AI Agent Builder supports a Closed (or Open) System with Grounding and Grounding Confidence threshold unlike ChatGPT GPTs currently.
This comes close to NotebookLM but
1. does not provide references from the original documents from which the answer was determined.