文章/答案/技术大牛

发布

社区首页 >问答首页 >用一个词大写几个字符

问用一个词大写几个字符
EN

Unix & Linux用户

提问于 2020-07-23 04:01:28

回答 1查看 53关注 0票数 1

这可能是一个疯狂的场景，但我只想知道，我们是否可以在没有分隔符的情况下，将很少的字符大写在一个单词中(识别要大写的字符)？

如：- I/P候选人-姓名候选人

O/P CandidateFirstName CandidateCity

shell-script

awk

sed

回答 1

Unix & Linux用户

发布于 2022-08-09 07:32:21

我用血栓素受体 Lisp编写了下面的程序来解决这个问题。这依赖于可用的/usr/share/dict/words字典文件；此外，它还添加了一些在计算机程序标识符中常见的缩写词。我找到了一些来源，并增加了一些自己。

该程序使用trie的内置trie工具从字典中构建一个trie。对于大型字典和内存不足的情况，trie结构进行了糟糕的优化，代码有点笨拙；它可以从精简中获益。

在最高级别，函数camelize返回一个它认为是好的(通常只是一个)单词的camel案例版本的列表，试图实现这个问题的要求。

我把它保留下来了；只是一个可以使用REPL探索的函数；没有包装的实用程序：

$ txr -i camel.tl
Do not operate heavy equipment or motor vehicles while using TXR.
1> (camelize "shellexecuteex")
("ShellExecuteEx")
2> (camelize "getenvptr")
("GetEnvPtr")
3> (camelize "hownowbrowncow")
("HowNowBrownCow")
4> (camelize "arcanthill")
("ArcAnthill")
5> (camelize "calcmd5digest")
("CalcMd5Digest")
6> (camelize "themother")
("TheMother" "ThemOther")

它如何工作的核心是函数break-word，它返回一个单词的所有可能的分块的列表。

该算法是递归的，其工作原理大致如下：

试着查找字典中找到的给定单词的所有前缀。例如，albumin给出了前缀a、alb、album和albumin。
如果找到一个或多个前缀，递归:对于每个前缀，找出将后缀拆分成单词的可能方法，而对于每种可能性，则将前缀放在前面。
如果没有找到前缀，则该单词以字典中没有的垃圾字符开头。在这种情况下，扫描单词的连续位置，以查看是否发生字典单词。例如，如果我们有34albumin，将跳过3和4，然后找到a。当“垃圾”被分隔时，将其视为单词:对单词的其余部分进行递归并合并，类似于步骤2。

camelize函数接受break-word的输出并选择一组候选故障，如下所示。

每个单词的除法都分配了一个排序键，它由一对数字组成:该单词除法的垃圾数量和长度。垃圾数量是指字典中没有的字符数。长度是该词除法中的元素数。
我们根据1中确定的键对分类列表进行排序。包含更多垃圾字符的单词分解被认为是更糟糕的。当两个单词分解包含相同数量的垃圾字符，其中一个有更多的片段被认为更糟。例如，在themother的例子中，有两个故障the mother和them other。它们都没有垃圾字符:它们的垃圾数量为零。它们都是两个长的元素，所以它们是等价的。算法同时选择了这两种算法。
在步骤2中排序之后，我们根据等价的键对排序进行分组，并选择最佳组。然后我们从那组生产CamelCase。

在单词分解中，通过一个列表注释来识别垃圾:它看起来像(:junk "grf")。例如，cat3dog将产生一个类似于("cat" (:junk "3") "dog")的故障。junk-quantity和camelize函数中的代码使用了一些结构模式匹配来处理这个问题。

长时间的输入需要一些时间，例如，这一次需要几秒钟。这可以通过编译代码来加速。break-word函数也可以从回忆录中受益，因为递归搜索会计算许多相同的后缀分解，因为尝试前缀组合会导致相同的结果。

1> (camelize "nowisthetimeforallgoodmentocometotheaidoftheircountry")
("NoWistHeTimeForAllGoodMenToComeToTheAidOfTheirCountry" 
 "NoWistHeTimeForAllGoodMenToComeToTheAidOftHeirCountry"
 "NoWistHeTimeForAllGoodMenToComeTotHeAidOfTheirCountry"
 "NoWistHeTimeForAllGoodMenToComeTotHeAidOftHeirCountry"
 "NoWistHeTimeForaLlGoodMenToComeToTheAidOfTheirCountry"
 "NoWistHeTimeForaLlGoodMenToComeToTheAidOftHeirCountry"
 "NoWistHeTimeForaLlGoodMenToComeTotHeAidOfTheirCountry"
 "NoWistHeTimeForaLlGoodMenToComeTotHeAidOftHeirCountry"
 "NowIsTheTimeForAllGoodMenToComeToTheAidOfTheirCountry"
 "NowIsTheTimeForAllGoodMenToComeToTheAidOftHeirCountry"
 "NowIsTheTimeForAllGoodMenToComeTotHeAidOfTheirCountry"
 "NowIsTheTimeForAllGoodMenToComeTotHeAidOftHeirCountry"
 "NowIsTheTimeForaLlGoodMenToComeToTheAidOfTheirCountry"
 "NowIsTheTimeForaLlGoodMenToComeToTheAidOftHeirCountry"
 "NowIsTheTimeForaLlGoodMenToComeTotHeAidOfTheirCountry"
 "NowIsTheTimeForaLlGoodMenToComeTotHeAidOftHeirCountry")

(为什么这个发现是ForaLlGood？因为ll被列为一个单词，因为它是计算标识符中的缩写。)

只是现在的代码：

(defvarl %dict% "/usr/share/dict/words")

(defun trie-dict (dict)
  (let ((trie (make-trie)))
    (each ((word dict))
      (if (> (len word) 2)
        (trie-add trie word t)))
    (each ((word '#"a I ad am an as at ax be by do ex go he hi \
                    id if in is it lo me mi my no of oh on or \
                    ow ox pa pi re so to un up us we abs act \
                    addr alloc alt arg attr app arr auth avg \
                    bat bg bin bool brk btn buf char calc cb \
                    cert cfg ch chr circ clr cmd cmp cnt \
                    concat conf config conn cont conv col coll \
                    com cord coord cos csum ctrl ctx cur cpy \
                    db dbg dec def def del dest dev dev diff \
                    dir dis disp doc drv dsc dt en enc env eq err \
                    expr exch exchg fig fmt fp func ge gen gt hex \
                    hdr hor hw id idx iface img inc info init int \
                    lang lat lib le len ll lon math max mem mcu \
                    mid min misc mng mod msg ne net num obj ord \
                    op os param pic pos posix pred pref prev proc \
                    prof ptr pwr px qry rand rect recv rem res \
                    ret rev req rng rx sem sel seq stat std str \
                    sin sqrt src swp sync temp temp tgl tmp tmr \
                    tran trans ts tx txt unix usr val var vert win \
                    xform xmit xref xtract"))
      (trie-add trie word t))
    trie))

(defvarl %trie% (trie-dict (file-get-lines %dict%)))

(defun break-word (trie word)
  (iflet ((lw (len word))
          ((plusp lw)))
    (build
      (let ((i 0)
            (cursor (trie-lookup-begin trie)))
        (whilet ((next (if (< i lw)
                         (trie-lookup-feed-char cursor [word i]))))
          (inc i)
          (set cursor next)
          (if (trie-value-at next)
            (let ((first-word [word 0..i])
                  (rest-words (break-word trie [word i..:])))
              (if rest-words
                (each ((rest-wordlist rest-words))
                  (add ^(,first-word ,*rest-wordlist)))
                (add ^(,first-word))))))
        (unless (get)
          (for ((j 1)) ((and (< j lw) (not (get)))) ((inc j))
            (let ((i j)
                  (cursor (trie-lookup-begin trie)))
              (whilet ((next (if (and (< i lw) (not (get)))
                               (trie-lookup-feed-char cursor [word i]))))
                (inc i)
                (set cursor next)
                (if (trie-value-at next)
                  (let ((junk-word [word 0..j])
                        (rest-words (break-word trie [word j..:])))
                    (each ((rest-wordlist rest-words))
                      (add ^((:junk ,junk-word) ,*rest-wordlist)))))))))
        (unless (get)
          (add ^((:junk ,word))))))))

(defun junk-quantity (broken-word)
  (let ((char-count 0))
    (each ((word broken-word))
      (if-match (:junk @str) word
        (inc char-count (len str))))
    char-count))

(defun camelize (word)
  (if (empty word)
    word
    (flow (break-word %trie% word)
      (mapcar [juxt [juxt junk-quantity len] use])
      (sort @1 : first)
      (partition-by first)
      first
      (mapcar second)
      (mapcar
        (opip (mapcar (do match @(or `@{x 1}@y`
                                     (:junk `@{x 1}@y`))
                                @1
                         `@(upcase-str x)@y`))
              cat-str)))))

票数 1

页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://unix.stackexchange.com/questions/599889

复制

相似问题

问用一个词大写几个字符
EN

回答 1

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用一个词大写几个字符EN

回答 1

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用一个词大写几个字符
EN