最近,就看了不少关于AI做前端还原的一些文章,之前写过一个URL就把别人网址复制了的这种耸人听闻的文章,根据里面的原理介绍,想必读过的人也知道,这种方式的弊端。那就是copy别人的网站虽然是容易的,但是AI写的代码是非常缺乏维护性的
,就连最基本的列表,他都不是list.map(it⇒item)的方式去写,而是呆板的一个一个去写。
今天看了一个github上开源的工程,Design2Code:https://github.com/NoviScl/Design2Code
那么,这个方式实现的自动化前端工程,是否是我们的前端开发小伙伴们的诺亚方舟(坟墓)呢?让我们一起来揭开他神秘的面纱吧。
这个开源的项目相关联的是这篇论文 https://arxiv.org/pdf/2403.03163.pdf ,他是这篇论文中的代码实践部分,因此我么通常可以直接看论文去了解他的原理,和他实现的效果,他既然敢公布测试的代码,那说明这篇论文里面的数据是比较可信的。
这篇论文的作者是4个大佬,分别是:
他们研究的主要目标就是,根据网页设计的屏幕截图自动生成能够渲染出该网页的HTML/CSS代码。他们的主要工作和贡献如下:
对于自动评估,他们考虑了高级视觉相似性 (CLIP) 和低级元素匹配(块匹配、文本、位置、颜色)。沿着这些不同的维度比较了所有基准模型。
可以发现,GPT-4V依然是遥遥领先的,不过,他们训练的模型倒是比Gemini Pro要略微强那么一些。
那么这几种方式的实现代码是怎么样的呢?其实我们通过了解prompt就ok了,一下三个是从代码仓库中找到的,源码在这里:
https://github.com/NoviScl/Design2Code/blob/main/Design2Code/prompting/gpt4v.py
直接提示法(Direct Prompting)
def direct_prompting(openai_client, image_file):
'''
{original input image + prompt} -> {output html}
'''
## the prompt
direct_prompt = ""
direct_prompt += "You are an expert web developer who specializes in HTML and CSS.\\n"
direct_prompt += "A user will provide you with a screenshot of a webpage.\\n"
direct_prompt += "You need to return a single html file that uses HTML and CSS to reproduce the given website.\\n"
direct_prompt += "Include all CSS code in the HTML file itself.\\n"
direct_prompt += "If it involves any images, use \\"rick.jpg\\" as the placeholder.\\n"
direct_prompt += "Some images on the webpage are replaced with a blue rectangle as the placeholder, use \\"rick.jpg\\" for those as well.\\n"
direct_prompt += "Do not hallucinate any dependencies to external files. You do not need to include JavaScript scripts for dynamic interactions.\\n"
direct_prompt += "Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout.\\n"
direct_prompt += "Respond with the content of the HTML+CSS file:\\n"
## encode image
base64_image = encode_image(image_file)
## call GPT-4V
html, prompt_tokens, completion_tokens, cost = gpt4v_call(openai_client, base64_image, direct_prompt)
return html, prompt_tokens, completion_tokens, cost
文本增强提示法(Text Augmented Prompting)
def text_augmented_prompting(openai_client, image_file):
'''
{original input image + extracted text + prompt} -> {output html}
'''
## extract all texts from the webpage
with open(image_file.replace(".png", ".html"), "r") as f:
html_content = f.read()
texts = "\\n".join(extract_text_from_html(html_content))
## the prompt
text_augmented_prompt = ""
text_augmented_prompt += "You are an expert web developer who specializes in HTML and CSS.\\n"
text_augmented_prompt += "A user will provide you with a screenshot of a webpage, along with all texts that they want to put on the webpage.\\n"
text_augmented_prompt += "The text elements are:\\n" + texts + "\\n"
text_augmented_prompt += "You should generate the correct layout structure for the webpage, and put the texts in the correct places so that the resultant webpage will look the same as the given one.\\n"
text_augmented_prompt += "You need to return a single html file that uses HTML and CSS to reproduce the given website.\\n"
text_augmented_prompt += "Include all CSS code in the HTML file itself.\\n"
text_augmented_prompt += "If it involves any images, use \\"rick.jpg\\" as the placeholder.\\n"
text_augmented_prompt += "Some images on the webpage are replaced with a blue rectangle as the placeholder, use \\"rick.jpg\\" for those as well.\\n"
text_augmented_prompt += "Do not hallucinate any dependencies to external files. You do not need to include JavaScript scripts for dynamic interactions.\\n"
text_augmented_prompt += "Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout.\\n"
text_augmented_prompt += "Respond with the content of the HTML+CSS file (directly start with the code, do not add any additional explanation):\\n"
## encode image
base64_image = encode_image(image_file)
## call GPT-4V
html, prompt_tokens, completion_tokens, cost = gpt4v_call(openai_client, base64_image, text_augmented_prompt)
return html, prompt_tokens, completion_tokens, cost
视觉修订提示法(Visual Revision Prompting)
def visual_revision_prompting(openai_client, input_image_file, original_output_image):
'''
{input image + initial output image + initial output html + oracle extracted text} -> {revised output html}
'''
## load the original output
with open(original_output_image.replace(".png", ".html"), "r") as f:
original_output_html = f.read()
## encode the image
input_image = encode_image(input_image_file)
original_output_image = encode_image(original_output_image)
## extract all texts from the webpage
with open(input_image_file.replace(".png", ".html"), "r") as f:
html_content = f.read()
texts = "\\n".join(extract_text_from_html(html_content))
prompt = ""
prompt += "You are an expert web developer who specializes in HTML and CSS.\\n"
prompt += "I have an HTML file for implementing a webpage but it has some missing or wrong elements that are different from the original webpage. The current implementation I have is:\\n" + original_output_html + "\\n\\n"
prompt += "I will provide the reference webpage that I want to build as well as the rendered webpage of the current implementation.\\n"
prompt += "I also provide you all the texts that I want to include in the webpage here:\\n"
prompt += "\\n".join(texts) + "\\n\\n"
prompt += "Please compare the two webpages and refer to the provided text elements to be included, and revise the original HTML implementation to make it look exactly like the reference webpage. Make sure the code is syntactically correct and can render into a well-formed webpage. You can use \\"rick.jpg\\" as the placeholder image file.\\n"
prompt += "Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout.\\n"
prompt += "Respond directly with the content of the new revised and improved HTML file without any extra explanations:\\n"
html, prompt_tokens, completion_tokens, cost = gpt4v_revision_call(openai_client, input_image, original_output_image, prompt)
return html, prompt_tokens, completion_tokens, cost
那么,这几种方式各有什么样的特点呢?
从结果上来看,GPT-4V Self-Revision Prompting的方式效果会更好一些:效果如下图
从图中,我们可以看到,还原度上,是绝对不能说100%的,甚至80%可能多有些勉强了,这对于像素眼的视觉设计师来讲,是万万不能接受的。因此,拿到这份AI自动转化的代码,可能还是需要很多的精力来做调整,谁能保证,比手工自己来写,然后配合copilot“结对编程(哈哈)”更加高效呢
?我想那些经验十足的前端开发者们,已经迫不及待想和前端代码自动生成的各种模型来大干一架,让像素眼的设计师们评判一下,到底谁还是这个领域的王者。
虽然,这篇论文中,我们需要肯定了Design2Code的意义,他可以降低前端开发的门槛,但我不认同他可以在短期内就取代前端开发,论文中也对各模型的细粒度表现进行了分析,指出了开源模型的不足之处,如召回输入元素、生成布局等方面有待提高
。这个也基本上决定了在自动化前端工程方面,也承认了前端工程自动化还有比较远的路需要走,但是好在,一步一步的看清了方向,就像,10年前,谁会相信GPT这么霸道呢?
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。