这可能是一个疯狂的场景,但我只想知道,我们是否可以在没有分隔符的情况下,将很少的字符大写在一个单词中(识别要大写的字符)?
如:- I/P候选人-姓名候选人
O/P CandidateFirstName CandidateCity
发布于 2022-08-09 07:32:21
我用血栓素受体 Lisp编写了下面的程序来解决这个问题。这依赖于可用的/usr/share/dict/words字典文件;此外,它还添加了一些在计算机程序标识符中常见的缩写词。我找到了一些来源,并增加了一些自己。
该程序使用trie的内置trie工具从字典中构建一个trie。对于大型字典和内存不足的情况,trie结构进行了糟糕的优化,代码有点笨拙;它可以从精简中获益。
在最高级别,函数camelize返回一个它认为是好的(通常只是一个)单词的camel案例版本的列表,试图实现这个问题的要求。
我把它保留下来了;只是一个可以使用REPL探索的函数;没有包装的实用程序:
$ txr -i camel.tl
Do not operate heavy equipment or motor vehicles while using TXR.
1> (camelize "shellexecuteex")
("ShellExecuteEx")
2> (camelize "getenvptr")
("GetEnvPtr")
3> (camelize "hownowbrowncow")
("HowNowBrownCow")
4> (camelize "arcanthill")
("ArcAnthill")
5> (camelize "calcmd5digest")
("CalcMd5Digest")
6> (camelize "themother")
("TheMother" "ThemOther")它如何工作的核心是函数break-word,它返回一个单词的所有可能的分块的列表。
该算法是递归的,其工作原理大致如下:
albumin给出了前缀a、alb、album和albumin。34albumin,将跳过3和4,然后找到a。当“垃圾”被分隔时,将其视为单词:对单词的其余部分进行递归并合并,类似于步骤2。camelize函数接受break-word的输出并选择一组候选故障,如下所示。
themother的例子中,有两个故障the mother和them other。它们都没有垃圾字符:它们的垃圾数量为零。它们都是两个长的元素,所以它们是等价的。算法同时选择了这两种算法。在单词分解中,通过一个列表注释来识别垃圾:它看起来像(:junk "grf")。例如,cat3dog将产生一个类似于("cat" (:junk "3") "dog")的故障。junk-quantity和camelize函数中的代码使用了一些结构模式匹配来处理这个问题。
长时间的输入需要一些时间,例如,这一次需要几秒钟。这可以通过编译代码来加速。break-word函数也可以从回忆录中受益,因为递归搜索会计算许多相同的后缀分解,因为尝试前缀组合会导致相同的结果。
1> (camelize "nowisthetimeforallgoodmentocometotheaidoftheircountry")
("NoWistHeTimeForAllGoodMenToComeToTheAidOfTheirCountry"
"NoWistHeTimeForAllGoodMenToComeToTheAidOftHeirCountry"
"NoWistHeTimeForAllGoodMenToComeTotHeAidOfTheirCountry"
"NoWistHeTimeForAllGoodMenToComeTotHeAidOftHeirCountry"
"NoWistHeTimeForaLlGoodMenToComeToTheAidOfTheirCountry"
"NoWistHeTimeForaLlGoodMenToComeToTheAidOftHeirCountry"
"NoWistHeTimeForaLlGoodMenToComeTotHeAidOfTheirCountry"
"NoWistHeTimeForaLlGoodMenToComeTotHeAidOftHeirCountry"
"NowIsTheTimeForAllGoodMenToComeToTheAidOfTheirCountry"
"NowIsTheTimeForAllGoodMenToComeToTheAidOftHeirCountry"
"NowIsTheTimeForAllGoodMenToComeTotHeAidOfTheirCountry"
"NowIsTheTimeForAllGoodMenToComeTotHeAidOftHeirCountry"
"NowIsTheTimeForaLlGoodMenToComeToTheAidOfTheirCountry"
"NowIsTheTimeForaLlGoodMenToComeToTheAidOftHeirCountry"
"NowIsTheTimeForaLlGoodMenToComeTotHeAidOfTheirCountry"
"NowIsTheTimeForaLlGoodMenToComeTotHeAidOftHeirCountry")(为什么这个发现是ForaLlGood?因为ll被列为一个单词,因为它是计算标识符中的缩写。)
只是现在的代码:
(defvarl %dict% "/usr/share/dict/words")
(defun trie-dict (dict)
(let ((trie (make-trie)))
(each ((word dict))
(if (> (len word) 2)
(trie-add trie word t)))
(each ((word '#"a I ad am an as at ax be by do ex go he hi \
id if in is it lo me mi my no of oh on or \
ow ox pa pi re so to un up us we abs act \
addr alloc alt arg attr app arr auth avg \
bat bg bin bool brk btn buf char calc cb \
cert cfg ch chr circ clr cmd cmp cnt \
concat conf config conn cont conv col coll \
com cord coord cos csum ctrl ctx cur cpy \
db dbg dec def def del dest dev dev diff \
dir dis disp doc drv dsc dt en enc env eq err \
expr exch exchg fig fmt fp func ge gen gt hex \
hdr hor hw id idx iface img inc info init int \
lang lat lib le len ll lon math max mem mcu \
mid min misc mng mod msg ne net num obj ord \
op os param pic pos posix pred pref prev proc \
prof ptr pwr px qry rand rect recv rem res \
ret rev req rng rx sem sel seq stat std str \
sin sqrt src swp sync temp temp tgl tmp tmr \
tran trans ts tx txt unix usr val var vert win \
xform xmit xref xtract"))
(trie-add trie word t))
trie))
(defvarl %trie% (trie-dict (file-get-lines %dict%)))
(defun break-word (trie word)
(iflet ((lw (len word))
((plusp lw)))
(build
(let ((i 0)
(cursor (trie-lookup-begin trie)))
(whilet ((next (if (< i lw)
(trie-lookup-feed-char cursor [word i]))))
(inc i)
(set cursor next)
(if (trie-value-at next)
(let ((first-word [word 0..i])
(rest-words (break-word trie [word i..:])))
(if rest-words
(each ((rest-wordlist rest-words))
(add ^(,first-word ,*rest-wordlist)))
(add ^(,first-word))))))
(unless (get)
(for ((j 1)) ((and (< j lw) (not (get)))) ((inc j))
(let ((i j)
(cursor (trie-lookup-begin trie)))
(whilet ((next (if (and (< i lw) (not (get)))
(trie-lookup-feed-char cursor [word i]))))
(inc i)
(set cursor next)
(if (trie-value-at next)
(let ((junk-word [word 0..j])
(rest-words (break-word trie [word j..:])))
(each ((rest-wordlist rest-words))
(add ^((:junk ,junk-word) ,*rest-wordlist)))))))))
(unless (get)
(add ^((:junk ,word))))))))
(defun junk-quantity (broken-word)
(let ((char-count 0))
(each ((word broken-word))
(if-match (:junk @str) word
(inc char-count (len str))))
char-count))
(defun camelize (word)
(if (empty word)
word
(flow (break-word %trie% word)
(mapcar [juxt [juxt junk-quantity len] use])
(sort @1 : first)
(partition-by first)
first
(mapcar second)
(mapcar
(opip (mapcar (do match @(or `@{x 1}@y`
(:junk `@{x 1}@y`))
@1
`@(upcase-str x)@y`))
cat-str)))))https://unix.stackexchange.com/questions/599889
复制相似问题